Skip to content

docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison#1661

Open
lmeyerov wants to merge 6 commits into
dev/gfql-seeded-traversal-indexfrom
docs/gfql-engine-docs
Open

docs(gfql): engine-selection guide (pandas/polars/cuDF/polars-gpu) + motivating comparison#1661
lmeyerov wants to merge 6 commits into
dev/gfql-seeded-traversal-indexfrom
docs/gfql-engine-docs

Conversation

@lmeyerov

Copy link
Copy Markdown
Contributor

Stacks on #1658 (top of the GFQL polars/index stack). Docs-only — no code change.

What

A persona-tested Choosing a GFQL Engine page documenting the four interchangeable engines (pandas / polars / cudf / polars-gpu), which until now were undocumented (grep confirmed zero doc mentions of polars).

  • docs/source/gfql/engines.rst (new) — numbers-first:
    • the one-keyword engine='polars' speedup (11–47× over pandas on real graphs, no GPU)
    • a motivating warm-median comparison table on real public graphs (LiveJournal 35M / Orkut 117M)
    • a decision matrix (workload shape × size × hardware → engine) with footnotes: ~1M crossover, GPU work-bound rule, polars-gpu 85M-row memory pressure, GPU-or-error contract
    • cuDF vs polars-gpu disambiguation (eager-op vs fused-lazy; cuDF is not deprecated)
    • honest "when NOT to use Polars", the differential-parity guarantee, and methodology + reproducer scripts
  • performance.rst — rewrote the top to lead with the engine comparison; de-marketed the prose flagged by the skeptic persona ("Unleashing", "Graph 500 levels", NVIDIA name-drop)
  • nav — wired the page into the GFQL toctree + recommended paths (added a CPU performance path)
  • quick.rst / about.rst — added polars/polars-gpu to the engine examples (previously pandas/cuDF only)

How it was scoped

Driven by 4-persona doc user-testing (pandas data scientist, RAPIDS/cuDF user, performance engineer, skeptical evaluator). Each persona read the current docs cold; the union of their must-haves is the acceptance bar. A round-2 user-test against the rendered docs follows.

Numbers trace to guarded benchmark runs (benchmarks/gfql/index_bulk_olap_bench.py); no figures invented.

🤖 Generated with Claude Code

lmeyerov and others added 6 commits June 28, 2026 21:35
…motivating comparison

New persona-tested "Choosing a GFQL Engine" page (gfql/engines.rst): the four
interchangeable engines, the one-keyword engine='polars' speedup (11-47x over
pandas on real graphs, no GPU), a motivating warm-median comparison table on real
public graphs (LiveJournal 35M / Orkut 117M), a decision matrix (shape x size x
hardware -> engine) with crossover/work-bound/memory-pressure/GPU-or-error
footnotes, cuDF-vs-polars-gpu disambiguation (eager vs fused-lazy; cuDF not
deprecated), an honest "when NOT to use Polars", the differential-parity guarantee,
and methodology + reproducer scripts.

Also: rewrote the top of gfql/performance.rst to lead with the engine comparison
(de-marketed the prose flagged by the skeptic persona), wired the page into the
GFQL toctree + recommended paths, and added polars/polars-gpu to the engine
examples in quick.rst and about.rst (docs previously mentioned only pandas/cuDF).

Driven by 4-persona doc user-testing (pandas data scientist, RAPIDS/cuDF user,
performance engineer, skeptical evaluator). Docs-only; no code change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…numbers, output-frame note

Applies fixes from a second persona user-testing pass on the rendered docs:

- performance.rst: removed the surviving marketing tail (the skeptic persona's #1
  residual) — "A New Era", "Empower Your Data Journey", "Join the Community", and the
  NVIDIA-investment-implies-performance line — replaced with a tight, de-superlatived
  "How GFQL is fast" (the real mechanisms) + a focused Next Steps.
- engines.rst: added the cuDF-WINS row to the comparison table (2-hop/100K seeds, ~85M
  output rows: cuDF 6.0s) so cuDF winning is visible without reading footnotes (RAPIDS
  persona); added a prominent note that result frames match the engine (polars-gpu/polars
  return polars.DataFrame; .to_pandas() to convert) — the pandas+RAPIDS personas' top
  practical gotcha; fixed the LDBC sf1 figure attribution (it is from a separate benchmark,
  not the cited Orkut/LiveJournal source-of-truth) to keep every on-page number traceable;
  added run counts + unified-memory note to Methodology (perf-engineer persona).

Docs-only; no code change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…) + fixes

Ran the repo's documented user-testing protocol (test-amplification SKILL §0
"User-Workflow Exploration") clean-room — two passes (need-finding vs original docs,
then QA on the produced docs) — and applied the deltas it surfaced that the earlier
ad-hoc persona pass missed:

Completeness (Pass A): finished the engine enumerations the ad-hoc pass deferred —
overview.rst now names all four engines + auto's resolution rule + the opt-in/no-silent-
fallback contract (was "GFQL automatically executes on GPU", which implied silent
selection); notebooks/gpu.rst now points GPU readers to the engines page.

Accuracy/QA (Pass B): reconciled the recurring "11-47x" headline to what the on-page
table supports (-> "up to ~38x", Orkut 1-hop, traceable) across 9 sites; fixed cuDF
"6-18x" -> "~15x (Orkut 1-hop)"; corrected a wrong "polars (CPU) is GPU-or-error" claim
(only polars-gpu is — CPU polars raises NotImplementedError); dropped the deprecated
`chain` from the engines.rst entrypoint line (gfql/hop only); scoped the ~87x kuzu claim
to LiveJournal + named its reproducer; stopped the CSR-index footnote from over-promising
an API page that doesn't document it yet; cited the orphaned [F4] footnote.

Docs-only; no code change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… silent-coercion warning

A §0 user-testing pass on polars-centric personas found a real P0 gap: nothing in the
docs spoke to a user who is ALREADY on Polars, and the silent default-path downgrade was
never warned. A graph built from polars.DataFrame run with the default engine='auto' is
coerced to pandas (auto -> cudf for cuDF input, pandas for everything else incl. Polars;
it never selects the Polars engine), so result._nodes comes back pandas and downstream
pl.* breaks at runtime.

Fixes:
- engines.rst: a `.. warning::` "Already a Polars user? pass engine='polars' — the default
  does not" with a pl.DataFrame in -> engine='polars' -> pl.DataFrame out worked example;
  co-located the "catch" (crossover + NotImplementedError) under the one-liner.
- overview.rst: spelled out that auto coerces a Polars-frame graph to pandas unless you pass
  engine='polars'.
- Added Polars to the accepted-input lists in engines.rst / overview.rst / about.rst (was
  "pandas, cuDF" only).

Artifact: plans/gfql-engine-docs/rounds/round-003/user_testing_playbook.md. Docs-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + reproducer

Leo pushed: polars beats pandas even below 1M? Correct. The "pandas wins below ~1M"
claim was stale (a coarse early finding) and contradicted the fast-path work. Fresh CPU
bench (benchmarks/gfql/index_crossover_bench.py, LiveJournal subsampled, warm-median,
current stack):

  shape         10K        100K       1M
  1-hop hop     polars2.7x polars4.5x polars7.6x
  WHERE+ORDER   polars3.0x polars3.0x polars18x
  trivial filter polars1.5x pandas2.0x pandas1.6x  (sub-ms; immaterial)

So CPU polars wins the common graph-query shapes (traversal / WHERE / aggregation) from
~10K edges up; the only pandas win is a trivial sub-millisecond equality mask where the
absolute difference is immaterial. The real small-size floor is GPU-only (cuDF/polars-gpu
kernel launch, work-bound) — NOT extended to GPU here (this bench is CPU-only; polars-gpu
stays the rougher, conditional case via F2/F3/F4).

Corrected: F1 (crossover ~10K not ~1M), the decision matrix (size col >~1M -> >~10K; the
"<1M -> pandas" row -> "trivial sub-ms op -> pandas, immaterial"), the "When not to use
Polars" first bullet, and the motivating-table note. Also reframed "Why opt-in?" so the
rationale rests on the NIE-surface robustness (auto-polars could error where pandas works),
not a perf regression — consistent with keeping auto on pandas.

Docs-only + one CPU bench reproducer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ll 4 engines)

The CSR index works on all four engines; benchmarked seeded 1-hop on LiveJournal 35M
(guarded, index==scan): pandas ~0.13ms / polars ~0.16ms (numpy searchsorted) vs cuDF ~3ms
(GPU kernel-launch floor) — the clean inverse of bulk. Pick the index for selective
traversal + a CPU engine to drive it. Reproducer benchmarks/gfql/index_largegraph_bench.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant