docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison by lmeyerov · Pull Request #1661 · graphistry/pygraphistry

lmeyerov · 2026-06-29T04:36:03Z

Stacks on #1658 (top of the GFQL polars/index stack). Docs-only — no code change.

What

A persona-tested Choosing a GFQL Engine page documenting the four interchangeable engines (pandas / polars / cudf / polars-gpu), which until now were undocumented (grep confirmed zero doc mentions of polars).

docs/source/gfql/engines.rst (new) — numbers-first:
- the one-keyword engine='polars' speedup (11–47× over pandas on real graphs, no GPU)
- a motivating warm-median comparison table on real public graphs (LiveJournal 35M / Orkut 117M)
- a decision matrix (workload shape × size × hardware → engine) with footnotes: ~1M crossover, GPU work-bound rule, polars-gpu 85M-row memory pressure, GPU-or-error contract
- cuDF vs polars-gpu disambiguation (eager-op vs fused-lazy; cuDF is not deprecated)
- honest "when NOT to use Polars", the differential-parity guarantee, and methodology + reproducer scripts
performance.rst — rewrote the top to lead with the engine comparison; de-marketed the prose flagged by the skeptic persona ("Unleashing", "Graph 500 levels", NVIDIA name-drop)
nav — wired the page into the GFQL toctree + recommended paths (added a CPU performance path)
quick.rst / about.rst — added polars/polars-gpu to the engine examples (previously pandas/cuDF only)

How it was scoped

Driven by 4-persona doc user-testing (pandas data scientist, RAPIDS/cuDF user, performance engineer, skeptical evaluator). Each persona read the current docs cold; the union of their must-haves is the acceptance bar. A round-2 user-test against the rendered docs follows.

Numbers trace to guarded benchmark runs (benchmarks/gfql/index_bulk_olap_bench.py); no figures invented.

🤖 Generated with Claude Code

…motivating comparison New persona-tested "Choosing a GFQL Engine" page (gfql/engines.rst): the four interchangeable engines, the one-keyword engine='polars' speedup (11-47x over pandas on real graphs, no GPU), a motivating warm-median comparison table on real public graphs (LiveJournal 35M / Orkut 117M), a decision matrix (shape x size x hardware -> engine) with crossover/work-bound/memory-pressure/GPU-or-error footnotes, cuDF-vs-polars-gpu disambiguation (eager vs fused-lazy; cuDF not deprecated), an honest "when NOT to use Polars", the differential-parity guarantee, and methodology + reproducer scripts. Also: rewrote the top of gfql/performance.rst to lead with the engine comparison (de-marketed the prose flagged by the skeptic persona), wired the page into the GFQL toctree + recommended paths, and added polars/polars-gpu to the engine examples in quick.rst and about.rst (docs previously mentioned only pandas/cuDF). Driven by 4-persona doc user-testing (pandas data scientist, RAPIDS/cuDF user, performance engineer, skeptical evaluator). Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…numbers, output-frame note Applies fixes from a second persona user-testing pass on the rendered docs: - performance.rst: removed the surviving marketing tail (the skeptic persona's #1 residual) — "A New Era", "Empower Your Data Journey", "Join the Community", and the NVIDIA-investment-implies-performance line — replaced with a tight, de-superlatived "How GFQL is fast" (the real mechanisms) + a focused Next Steps. - engines.rst: added the cuDF-WINS row to the comparison table (2-hop/100K seeds, ~85M output rows: cuDF 6.0s) so cuDF winning is visible without reading footnotes (RAPIDS persona); added a prominent note that result frames match the engine (polars-gpu/polars return polars.DataFrame; .to_pandas() to convert) — the pandas+RAPIDS personas' top practical gotcha; fixed the LDBC sf1 figure attribution (it is from a separate benchmark, not the cited Orkut/LiveJournal source-of-truth) to keep every on-page number traceable; added run counts + unified-memory note to Methodology (perf-engineer persona). Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…) + fixes Ran the repo's documented user-testing protocol (test-amplification SKILL §0 "User-Workflow Exploration") clean-room — two passes (need-finding vs original docs, then QA on the produced docs) — and applied the deltas it surfaced that the earlier ad-hoc persona pass missed: Completeness (Pass A): finished the engine enumerations the ad-hoc pass deferred — overview.rst now names all four engines + auto's resolution rule + the opt-in/no-silent- fallback contract (was "GFQL automatically executes on GPU", which implied silent selection); notebooks/gpu.rst now points GPU readers to the engines page. Accuracy/QA (Pass B): reconciled the recurring "11-47x" headline to what the on-page table supports (-> "up to ~38x", Orkut 1-hop, traceable) across 9 sites; fixed cuDF "6-18x" -> "~15x (Orkut 1-hop)"; corrected a wrong "polars (CPU) is GPU-or-error" claim (only polars-gpu is — CPU polars raises NotImplementedError); dropped the deprecated `chain` from the engines.rst entrypoint line (gfql/hop only); scoped the ~87x kuzu claim to LiveJournal + named its reproducer; stopped the CSR-index footnote from over-promising an API page that doesn't document it yet; cited the orphaned [F4] footnote. Docs-only; no code change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… silent-coercion warning A §0 user-testing pass on polars-centric personas found a real P0 gap: nothing in the docs spoke to a user who is ALREADY on Polars, and the silent default-path downgrade was never warned. A graph built from polars.DataFrame run with the default engine='auto' is coerced to pandas (auto -> cudf for cuDF input, pandas for everything else incl. Polars; it never selects the Polars engine), so result._nodes comes back pandas and downstream pl.* breaks at runtime. Fixes: - engines.rst: a `.. warning::` "Already a Polars user? pass engine='polars' — the default does not" with a pl.DataFrame in -> engine='polars' -> pl.DataFrame out worked example; co-located the "catch" (crossover + NotImplementedError) under the one-liner. - overview.rst: spelled out that auto coerces a Polars-frame graph to pandas unless you pass engine='polars'. - Added Polars to the accepted-input lists in engines.rst / overview.rst / about.rst (was "pandas, cuDF" only). Artifact: plans/gfql-engine-docs/rounds/round-003/user_testing_playbook.md. Docs-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… + reproducer Leo pushed: polars beats pandas even below 1M? Correct. The "pandas wins below ~1M" claim was stale (a coarse early finding) and contradicted the fast-path work. Fresh CPU bench (benchmarks/gfql/index_crossover_bench.py, LiveJournal subsampled, warm-median, current stack): shape 10K 100K 1M 1-hop hop polars2.7x polars4.5x polars7.6x WHERE+ORDER polars3.0x polars3.0x polars18x trivial filter polars1.5x pandas2.0x pandas1.6x (sub-ms; immaterial) So CPU polars wins the common graph-query shapes (traversal / WHERE / aggregation) from ~10K edges up; the only pandas win is a trivial sub-millisecond equality mask where the absolute difference is immaterial. The real small-size floor is GPU-only (cuDF/polars-gpu kernel launch, work-bound) — NOT extended to GPU here (this bench is CPU-only; polars-gpu stays the rougher, conditional case via F2/F3/F4). Corrected: F1 (crossover ~10K not ~1M), the decision matrix (size col >~1M -> >~10K; the "<1M -> pandas" row -> "trivial sub-ms op -> pandas, immaterial"), the "When not to use Polars" first bullet, and the motivating-table note. Also reframed "Why opt-in?" so the rationale rests on the NIE-surface robustness (auto-polars could error where pandas works), not a perf regression — consistent with keeping auto on pandas. Docs-only + one CPU bench reproducer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ll 4 engines) The CSR index works on all four engines; benchmarked seeded 1-hop on LiveJournal 35M (guarded, index==scan): pandas ~0.13ms / polars ~0.16ms (numpy searchsorted) vs cuDF ~3ms (GPU kernel-launch floor) — the clean inverse of bulk. Pick the index for selective traversal + a CPU engine to drive it. Reproducer benchmarks/gfql/index_largegraph_bench.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lmeyerov and others added 6 commits June 28, 2026 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison#1661

docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison#1661
lmeyerov wants to merge 6 commits into
dev/gfql-seeded-traversal-indexfrom
docs/gfql-engine-docs

lmeyerov commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lmeyerov commented Jun 29, 2026

What

How it was scoped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant