docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison#1661
Open
lmeyerov wants to merge 6 commits into
Open
docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison#1661lmeyerov wants to merge 6 commits into
lmeyerov wants to merge 6 commits into
Conversation
…motivating comparison New persona-tested "Choosing a GFQL Engine" page (gfql/engines.rst): the four interchangeable engines, the one-keyword engine='polars' speedup (11-47x over pandas on real graphs, no GPU), a motivating warm-median comparison table on real public graphs (LiveJournal 35M / Orkut 117M), a decision matrix (shape x size x hardware -> engine) with crossover/work-bound/memory-pressure/GPU-or-error footnotes, cuDF-vs-polars-gpu disambiguation (eager vs fused-lazy; cuDF not deprecated), an honest "when NOT to use Polars", the differential-parity guarantee, and methodology + reproducer scripts. Also: rewrote the top of gfql/performance.rst to lead with the engine comparison (de-marketed the prose flagged by the skeptic persona), wired the page into the GFQL toctree + recommended paths, and added polars/polars-gpu to the engine examples in quick.rst and about.rst (docs previously mentioned only pandas/cuDF). Driven by 4-persona doc user-testing (pandas data scientist, RAPIDS/cuDF user, performance engineer, skeptical evaluator). Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…numbers, output-frame note Applies fixes from a second persona user-testing pass on the rendered docs: - performance.rst: removed the surviving marketing tail (the skeptic persona's #1 residual) — "A New Era", "Empower Your Data Journey", "Join the Community", and the NVIDIA-investment-implies-performance line — replaced with a tight, de-superlatived "How GFQL is fast" (the real mechanisms) + a focused Next Steps. - engines.rst: added the cuDF-WINS row to the comparison table (2-hop/100K seeds, ~85M output rows: cuDF 6.0s) so cuDF winning is visible without reading footnotes (RAPIDS persona); added a prominent note that result frames match the engine (polars-gpu/polars return polars.DataFrame; .to_pandas() to convert) — the pandas+RAPIDS personas' top practical gotcha; fixed the LDBC sf1 figure attribution (it is from a separate benchmark, not the cited Orkut/LiveJournal source-of-truth) to keep every on-page number traceable; added run counts + unified-memory note to Methodology (perf-engineer persona). Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…) + fixes Ran the repo's documented user-testing protocol (test-amplification SKILL §0 "User-Workflow Exploration") clean-room — two passes (need-finding vs original docs, then QA on the produced docs) — and applied the deltas it surfaced that the earlier ad-hoc persona pass missed: Completeness (Pass A): finished the engine enumerations the ad-hoc pass deferred — overview.rst now names all four engines + auto's resolution rule + the opt-in/no-silent- fallback contract (was "GFQL automatically executes on GPU", which implied silent selection); notebooks/gpu.rst now points GPU readers to the engines page. Accuracy/QA (Pass B): reconciled the recurring "11-47x" headline to what the on-page table supports (-> "up to ~38x", Orkut 1-hop, traceable) across 9 sites; fixed cuDF "6-18x" -> "~15x (Orkut 1-hop)"; corrected a wrong "polars (CPU) is GPU-or-error" claim (only polars-gpu is — CPU polars raises NotImplementedError); dropped the deprecated `chain` from the engines.rst entrypoint line (gfql/hop only); scoped the ~87x kuzu claim to LiveJournal + named its reproducer; stopped the CSR-index footnote from over-promising an API page that doesn't document it yet; cited the orphaned [F4] footnote. Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… silent-coercion warning A §0 user-testing pass on polars-centric personas found a real P0 gap: nothing in the docs spoke to a user who is ALREADY on Polars, and the silent default-path downgrade was never warned. A graph built from polars.DataFrame run with the default engine='auto' is coerced to pandas (auto -> cudf for cuDF input, pandas for everything else incl. Polars; it never selects the Polars engine), so result._nodes comes back pandas and downstream pl.* breaks at runtime. Fixes: - engines.rst: a `.. warning::` "Already a Polars user? pass engine='polars' — the default does not" with a pl.DataFrame in -> engine='polars' -> pl.DataFrame out worked example; co-located the "catch" (crossover + NotImplementedError) under the one-liner. - overview.rst: spelled out that auto coerces a Polars-frame graph to pandas unless you pass engine='polars'. - Added Polars to the accepted-input lists in engines.rst / overview.rst / about.rst (was "pandas, cuDF" only). Artifact: plans/gfql-engine-docs/rounds/round-003/user_testing_playbook.md. Docs-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + reproducer Leo pushed: polars beats pandas even below 1M? Correct. The "pandas wins below ~1M" claim was stale (a coarse early finding) and contradicted the fast-path work. Fresh CPU bench (benchmarks/gfql/index_crossover_bench.py, LiveJournal subsampled, warm-median, current stack): shape 10K 100K 1M 1-hop hop polars2.7x polars4.5x polars7.6x WHERE+ORDER polars3.0x polars3.0x polars18x trivial filter polars1.5x pandas2.0x pandas1.6x (sub-ms; immaterial) So CPU polars wins the common graph-query shapes (traversal / WHERE / aggregation) from ~10K edges up; the only pandas win is a trivial sub-millisecond equality mask where the absolute difference is immaterial. The real small-size floor is GPU-only (cuDF/polars-gpu kernel launch, work-bound) — NOT extended to GPU here (this bench is CPU-only; polars-gpu stays the rougher, conditional case via F2/F3/F4). Corrected: F1 (crossover ~10K not ~1M), the decision matrix (size col >~1M -> >~10K; the "<1M -> pandas" row -> "trivial sub-ms op -> pandas, immaterial"), the "When not to use Polars" first bullet, and the motivating-table note. Also reframed "Why opt-in?" so the rationale rests on the NIE-surface robustness (auto-polars could error where pandas works), not a perf regression — consistent with keeping auto on pandas. Docs-only + one CPU bench reproducer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ll 4 engines) The CSR index works on all four engines; benchmarked seeded 1-hop on LiveJournal 35M (guarded, index==scan): pandas ~0.13ms / polars ~0.16ms (numpy searchsorted) vs cuDF ~3ms (GPU kernel-launch floor) — the clean inverse of bulk. Pick the index for selective traversal + a CPU engine to drive it. Reproducer benchmarks/gfql/index_largegraph_bench.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacks on #1658 (top of the GFQL polars/index stack). Docs-only — no code change.
What
A persona-tested Choosing a GFQL Engine page documenting the four interchangeable engines (
pandas/polars/cudf/polars-gpu), which until now were undocumented (grep confirmed zero doc mentions ofpolars).docs/source/gfql/engines.rst(new) — numbers-first:engine='polars'speedup (11–47× over pandas on real graphs, no GPU)performance.rst— rewrote the top to lead with the engine comparison; de-marketed the prose flagged by the skeptic persona ("Unleashing", "Graph 500 levels", NVIDIA name-drop)quick.rst/about.rst— added polars/polars-gpu to the engine examples (previously pandas/cuDF only)How it was scoped
Driven by 4-persona doc user-testing (pandas data scientist, RAPIDS/cuDF user, performance engineer, skeptical evaluator). Each persona read the current docs cold; the union of their must-haves is the acceptance bar. A round-2 user-test against the rendered docs follows.
Numbers trace to guarded benchmark runs (
benchmarks/gfql/index_bulk_olap_bench.py); no figures invented.🤖 Generated with Claude Code