feat(gfql): native lazy Polars engine — collect-once traversals + cypher row pipeline by lmeyerov · Pull Request #1648 · graphistry/pygraphistry

lmeyerov · 2026-06-25T03:51:43Z

Summary

Native CPU Polars execution engine for GFQL (Engine.POLARS, opt-in via engine='polars') — combines the former two-PR split (hop/chain traversals + cypher row pipeline) into one cohesive CPU-engine PR. The production pandas/cuDF paths are untouched; engine='auto' with Polars input still coerces to pandas as before.

Note: this is the CPU Polars engine. A GPU engine (cudf_polars/POLARS_GPU) is a separate PR (#1655) that stacks on top of this. Stacks on #1652 (general perf opts).

Traversals — `hop()` / `chain()`

Native vectorized BFS via semi/anti joins (no per-row Python). Forward/reverse/undirected single-hop, directed multi-hop chains, node/edge filter dicts and predicates (lowered to pl.Expr), edge_match/source_node_match/destination_node_match, target_wave_front, alias names. Deferred (honest NotImplementedError): variable-length/multi-hop edges, undirected edges in multi-edge chains, hop labels, node query=.

Cypher row pipeline — `MATCH … RETURN`

NO CHEATING: every query runs natively on Polars or raises an honest NotImplementedError pointing at engine='pandas' — never a silent pandas bridge. Native: frame ops (rows/limit/skip/distinct/drop_cols), select/with_/return_ projection (cypher-expr-AST → pl.Expr: property/arithmetic/comparison/boolean/literal + coalesce/abs), where_rows (OR/NOT WHERE, Kleene 3-valued), order_by, group_by (count/sum/avg/min/max), unwind (literal cross-join), property/expr result projection, int/string/bool entity-text (pl.concat_str). Honestly deferred → NIE: cross-entity same-path WHERE, multi-entity binding_ops, float/temporal/nested entity-text, exotic exprs.

Conformance hardening

Driven by the cross-repo Cypher TCK differential (pandas-vs-polars). Every fix either matches pandas natively or declines honestly (NO-CHEATING — no silent bridge, no wrong answer):

IEEE NaN comparison semantics — 0.0/0.0 > 1 etc.: Polars treats NaN as the largest value; now masked to the IEEE/pandas answer (& ~is_nan for ordering/=, | is_nan for <>), driven by the lowered expression's output dtype (robustly covers int/int→float division + function results).
numeric-vs-string comparison (n.val > 'a') — would Rust-panic; now NIE (recurses into AllOf/Between, covers Categorical/Enum + arithmetic on all-null→String columns).
ISO-temporal ordering / temporal arithmetic (time({...}) > time({...}), a.time + duration({...})) — were lexicographic/string-concat wrong answers; now NIE (narrowed to ordering of two temporal literals; =/<> stay native).
temporal-constructor-string property + heterogeneous (mixed-type) column projections — NIE (clear message naming the column) instead of leaking raw constructor text / cryptic ArrowInvalid.
UNION DISTINCT — engine-aware Engine.df_unique (Polars unique(maintain_order=True)) instead of the pandas-only drop_duplicates crash.
OPTIONAL MATCH — absent whole-entity renders null (not '()'); null-row-fill alignment shape declines honestly (NIE) instead of a misleading validation error.
3-valued boolean over null literals (null AND null, NOT null) — AND/OR/NOT cast to pl.Boolean so Kleene logic evaluates instead of raising.
label match on the reserved labels List column — list.contains (membership) instead of a List→String cast crash; user List properties decline honestly.
predicate pandas-bridge removed — an unlowerable filter_by_dict predicate now raises NIE (was silently .to_pandas()-bridging); native lowering widened (AllOf, IsNull/NotNull, case-insensitive STARTS/ENDS WITH).
WITH-scalar MATCH re-entry — engine-aware (pl.row/with_columns) instead of a pandas .iloc crash.

POLARS_ENGINES = (Engine.POLARS,) is introduced here (the GPU target PR #1655 extends it) so the engine-aware helpers are self-contained at this layer.

Validation

Differential parity vs the pandas engine (hop + chain suites + seeded fuzzer + a TCK-style cypher conformance lane with NULL/3-valued-logic + a DEFERRED list asserting deferred queries raise rather than bridge). Full graphistry/tests/compute/gfql/ suite green (incl. the 1610-test cypher dir).

Cross-repo Cypher TCK, polars arm: the differential pandas-vs-polars lane is clean — 0 wrong-answers across the full TCK (every scenario either matches pandas or honestly declines; 387 honest declines). This is the headline correctness guarantee: the Polars engine never silently disagrees with pandas.

Perf (interleaved, 1M nodes, each engine on its native-frame graph, all native)

Polars wins 5.6–38× across the surface: RETURN n ~38×, ORDER BY ~17×, traversals 6–7.5×, projections/aggregations/DISTINCT 5.6–6.9×. Plus eager fast paths (node-only / single-hop / unconstrained 1-hop) that move the polars>pandas crossover below ~100K for real viz/crossfilter shapes.

(Supersedes the former PR2 #1649, folded in here.)

🤖 Generated with Claude Code

@100

Complements the Cypher whole-query parse memo in PR #1652. Once parse_cypher is cached, profiling the residual fixed per-call compile cost showed it is dominated by parse_expr() — the GFQL row-expression parser, invoked ~4x per query compile (each RETURN/WHERE/WITH expression) and, per call, rebuilding a Lark transformer that defines a frozen dataclass via runtime exec (dataclasses._process_class / _create_fn / exec churn). parse_expr() is a pure function of the expression string (no params/schema) and returns a tree of frozen dataclasses (17 frozen node types, tuple-valued fields, immutable). So identical expressions — re-parsed on every compile, and recurring across queries (e.g. `a.val > 50`) — are memoized via lru_cache(maxsize=1024). Non-str/empty guard stays outside the cache; only successful parses are cached. No source consumer outside graphistry/compute/gfql calls parse_expr, and nothing bypasses frozen-ness (no object.__setattr__). Measured (dgx-spark, median-of-9): stacked on the query-parse memo, the fixed per-call cost of a repeated string Cypher query drops a further ~4.5 ms -> ~1.8 ms (RETURN a @100 rows: ~6.3 -> 3.6 ms) — near parity with the equivalent native chain. Touches only expr_parser.py (+ its test) — disjoint from PR #1652's parser.py, so the two apply independently in either order. Tests: 3 focused expr-cache tests (memoization identity, distinct + hit registration, invalid non-caching). Full graphistry/tests/compute/gfql/: 2512 passed, 16 skipped, 15 xfailed; only 2 unrelated in-container artifacts fail (networkx setup.py packaging; a cugraph "without cugraph" shortest-path test in an image that has cugraph). ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Terminal Cypher `RETURN a` (whole node/edge) previously emitted one column of Cypher display strings (`({id: 51, val: 51, kind: 'a'})`) built row-wise. The string is a *presentation* format (it matches the cypher-shell / TCK oracle), not data — callers had to re-parse it to use it, and constructing it is O(rows). This flattens whole-entity returns into structured `{alias}.{field}` columns (`a.id, a.val, a.kind`, ...) by default. The per-field columns already exist on the working frame before projection, so this is "stop collapsing", not "rebuild": near-free, lossless, directly usable, and it survives JSON / CSV / Parquet / Arrow serialization and `plot()`. Measured (dgx-spark, median-of-7, RETURN a vs old text form): pandas @100k 32 vs 204 ms (6.4x); cuDF @100k 27 vs 114 ms (4.3x). Win grows with row count (text render is O(rows); flat is ~free). Design: - `apply_result_projection(..., structured=True)` emits flat columns for whole-entity returns; `structured=False` keeps the legacy single Cypher-display-string column. The OPTIONAL-MATCH null-fill / projection row-guard paths (which still consume a single-column entity value for row alignment) opt out via this flag and are unchanged. - A synthesized null/absent-entity row (top-level OPTIONAL-MATCH miss or OPTIONAL WITH-reentry no-match, built by `_apply_empty_result_row` as a single `{alias: None}` column) has no field columns to flatten, so it falls back to the single-column text form — rendering to None and preserving the shape the OPTIONAL / reentry machinery consumes for identity recovery and no-match detection. Real rows always carry flat fields and flatten. - Text is now presentation-only: `render_entity_text(result, alias)` reconstructs the Cypher display string on demand (used by the conformance / TCK driver and any caller wanting the human-readable form). The structured data path never pays the render cost. - The entity-projection meta `ids` snapshot (`.copy()`) is retained — bounded reentry recovers carried node identities from it and must not alias the live frame (#1356). Tests: whole-entity text assertions migrated to a `entity_text_records` shim that renders flat -> text for comparison against the pre-#1650 Cypher-text oracle; grouping / connected-optional / null_fill paths (still single-column text) keep direct text assertions; flat-shape + render-helper + meta tests added. gfql/cypher + row suites: 1646 passed, 15 xfailed (only the unrelated in-container networkx setup.py packaging artifact fails). Cross-repo follow-ups (separate, after this lands): tck-gfql conformance adapter (structured -> text at the comparison hook) and pyg-bench probes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tructured returns Squashed reconciliation of the native lazy Polars GFQL engine (was #1648's 28 commits; full history preserved at tag bak/1648) restacked onto the colleague's #1656 structured whole-entity returns + #1657 parse_expr memo. Engine: native polars hop/chain (semi/anti joins), native cypher row pipeline (select/where/order_by/group_by/unwind/projection), lazy single-hop collect-once with CPU/GPU execution targets (gfql/lazy/). NO pandas bridge — native or honest NotImplementedError (plan.md NO-CHEATING). Reconciliation with #1650 structured returns: apply_result_projection now threads `structured` to the polars path (apply_result_projection_polars). Whole-entity RETURN a flattens to {alias}.{field} columns natively (mirrors the pandas _flat_entity_field_names selection exactly), which — unlike the legacy entity-text expr — works for ANY dtype (float/temporal/nested just become columns), so polars structured == pandas structured across the board. structured=False still renders the native Cypher display string for int/string/bool single-entity nodes. _include_numeric_id_as_property is now polars-aware so id flattens identically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@10m

…alized hops Build the whole forward/backward combine (combine_nodes/edges + endpoint + alias names) as ONE deferred pl.LazyFrame plan over the already-materialized hop frames and collect once, instead of ~a dozen eager ops that each internally lazy().op().collect(). Stable order columns (NORD/EORD) restore the eager g._nodes/g._edges order since lazy joins don't preserve it -> trailing LIMIT/SKIP unaffected, byte-identical (full polars conformance + row-pipeline parity, 2858 gfql tests). NO recompute (inputs materialized; unlike the disproven whole-chain fusion). ~5% faster polars 1-hop chain @1M/@10m; GPU-target neutral. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@1m

A single MATCH (n) with no edge hop — the dominant tabular/crossfilter shape (MATCH (n) WHERE/RETURN ..., histograms, filters, table search) — now returns the filtered node table directly and skips the whole forward/backward/combine + collect_all (~2.5 ms fixed cost that dominated small/interactive queries). Byte-identical (full polars conformance + row-pipeline parity, 389 polars tests). Moves the polars>pandas crossover BELOW 100K for real product workloads: categorical histogram 0.68->1.70x @100k / 1.38->7.62x @1m; node filter 2.44->13.85x @1m; timeline 2.55->8.12x @1m. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@1m

A single MATCH (a)-[e]->(b) with both nodes unconstrained (no filter/name/query) and a plain edge (no match/name/query) — the basic graph query and the viz edge-crossfilter MATCH — returns ALL edges + their endpoint nodes directly (direction-independent; isolated nodes excluded), skipping forward/backward/ combine. For unconstrained nodes the backward pass prunes nothing, so this is byte-identical (full polars conformance + row-pipeline parity + adversarial graphs: dup/self-loop/cycle/isolated). ~9x faster polars [n,e,n]: 95.6->10.3 ms @1m, 855->99 ms @10m. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Extend the unconstrained 1-hop fast path to filtered nodes: MATCH (a {f})-[e]->(b) (src/dst/both filters; the dominant "filter then expand" viz crossfilter pattern) returns the edges whose endpoints pass the node filters + those endpoint nodes, skipping forward/backward/combine. For one hop the backward pass prunes nothing beyond the endpoint filters, so byte-identical (verified vs pandas: src/dst/both filters, reverse, dup/self-loop/cycle/isolated; full polars conformance + row-pipeline parity, 2858 gfql tests). Unconstrained: all edges any direction; filtered: forward/reverse (filtered-undirected falls through to the full path). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@10m

…aster) The GPU target collects with pl.GPUEngine(executor="in-memory") instead of the default streaming engine="gpu" (DefaultSingletonEngine). GFQL results fit in device memory, the in-memory engine's regime: faster on the hop primitives (semijoin 1.33x, antijoin 2.58x, unique 1.49x @10m) and far more STABLE -- the streaming executor spiked bimodally to ~1s on the same semijoin (median ~360ms), in-memory holds ~30ms. Fixes the GPU instability seen in the pr11 measurements. Parity preserved (polars-gpu == polars, 39 tests). gfql chains aren't GPU-compute-bound (orchestration + eager fast paths dominate) so this is a stability/correctness fix for GPU-collect paths, not a chain speedup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per the #1656 author's handoff: the elif-structured single-column text fallback in _apply_result_projection_pandas looks redundant but fixes two regressions (top-level OPTIONAL-MATCH miss; OPTIONAL-WITH-reentry no-match). Mark DO NOT REMOVE so a later 'tidy' doesn't reintroduce them. Our polars structured-returns reconciliation touched this file; verified the fallback is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… OTel span placement Adversarial review of the lower stack layers (#1648 polars engine, #1652 generic fast paths) found 2 bugs: - BLOCKER (#1648): a chain crashed under engine=polars with SchemaError when an edge endpoint dtype differed from the node-id dtype across int<->float (e.g. a null in a source/dest column -> float64 vs int64 ids) where pandas joins fine. The hop aligns join keys; the chain fast paths + combine did not. Added _align_edge_endpoints (cast endpoints to node-id dtype for the traversal, restore output dtype to match pandas; no-op when dtypes match) wired into the single-hop fast path + multi-hop. - (#1652): the gfql.chain OTel @otel_traced decorator had landed on the internal _try_chain_fast_path probe (inserted between the decorator and def chain) instead of the public chain() — chain() lost its span, span recorded wrong fn/attrs. Moved it. Both verified + regression-tested. 457 polars tests + 334 generic chain tests pass (4 fails are the pre-existing local libnvrtc CUDA-env issue, not these changes). Row-order divergences the review also found (fast path returns table order vs the full machinery BFS-discovery order, for reverse/undirected without ORDER BY — Cypher- undefined, sets/values identical, no repo test depends on it) are claim-precision, not bugs; CHANGELOG wording to be tightened in the per-layer #1648/#1652 review pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…pandas drop_duplicates) RETURN ... UNION RETURN ... (distinct) crashed under engine='polars'/'polars-gpu' with AttributeError: 'DataFrame' object has no attribute 'drop_duplicates' — the union de-dup in gfql_unified._execute_compiled_query called pandas-only drop_duplicates on a polars frame. Added engine-aware Engine.df_unique (polars unique(maintain_order=True); pandas/cuDF drop_duplicates(keep='first')), matching the row/frame_ops.distinct convention, and routed the UNION DISTINCT through it. Surfaced by the cross-repo TCK conformance run (tck-gfql TEST_POLARS=1, union1). Regression-tested in test_engine_polars_cypher_conformance.py (4 UNION cases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t to Boolean) null AND null / null OR null / NOT null crashed under engine='polars' with InvalidOperationError ('bitand'/'not' not supported for dtype null): a bare null literal lowers to a Null-dtype polars expr where &/|/~ are undefined. Cast AND/OR/ NOT operands to pl.Boolean in the expr lowering so Cypher Kleene 3-valued logic evaluates (true AND null=null, false OR null=null, NOT null=null); casting a real Boolean column is a no-op, and polars Boolean &/|/~ already match Cypher Kleene. Surfaced by the TCK run (expr-boolean1/2/4). Regression-tested in test_engine_polars_cypher_conformance.py (bare-RETURN null boolean cases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ot ArrowInvalid A pandas object column holding mixed Python types (e.g. int 0 + str 'xx' — legal for dynamically-typed Cypher properties) is unrepresentable in polars/Arrow: pl.from_pandas raised a cryptic 'pyarrow.lib.ArrowInvalid: Could not convert xx with type str: tried to convert to int64' from deep inside construction. Wrap the pandas->polars conversion in Engine.df_to_engine (_pl_from_pandas) to raise a clear NotImplementedError naming the offending column(s) and pointing at engine='pandas' (NO-CHEATING: no silent string-coercion, which would change comparison semantics). Surfaced by the TCK run (expr-comparison2, match-where5, with-where5). The harness tolerates honest NIE as a coverage decline; before this they crashed as failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…projection A property column holding Cypher temporal-constructor text (date({year:1910,...}), how Cypher/TCK store temporal values) leaked the raw constructor string under engine='polars' instead of the ISO form ('1910-05-06') the pandas projection produces via _normalize_temporal_constructor_series. That normalizer is not yet native, so both projection paths (engine_polars.projection final result projection + row_pipeline.select_polars WITH/RETURN) now detect temporal-constructor String columns (reusing TEMPORAL_CALL_EXPR_RE, native .str.contains scan over String cols only) and raise NotImplementedError rather than emit a wrong rendering. Surfaced by the TCK run (with-orderby1-33+, the largest wrong-answer cluster, ~33). Whole-entity RETURN a over a temporal property is unaffected (flattens + renders via render_entity_text). Regression-tested in test_engine_polars_cypher_conformance.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… +/-) a.time + duration({minutes: 6}) silently became STRING CONCATENATION under engine='polars': cypher duration({...}) translates to an ISO duration string literal ('PT6M'), and the expr lowering applied + to two strings, so an ORDER BY sorted lexicographically on the concatenated text (wrong order). The lowering now raises NotImplementedError when +/- has an ISO-duration string-literal operand (^-?P(?=[0-9T]), which doesn't misfire on ordinary strings like 'Prefix'); the pandas engine handles temporal arithmetic. Surfaced by the TCK run (with-orderby2 cluster, silent wrong-order). Regression- tested in test_engine_polars_cypher_conformance.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…n native lowering filter_by_dict on engine='polars' evaluated any non-natively-lowerable predicate by converting the column to pandas (.to_pandas()), running the pandas callable, and carrying the mask back — a silent polars->pandas bridge presenting pandas semantics as polars. Removed it: unsupported predicates now raise NotImplementedError (use engine='pandas'). To keep common queries native, widened predicate_to_expr: - AllOf (conjunction, e.g. n.val > 20 AND n.val < 90 -> AllOf[GT,LT]) lowered recursively - IsNull/IsNA -> is_null(), NotNull/NotNA -> is_not_null() - case-insensitive STARTS WITH / ENDS WITH via anchored (?i) regex on re.escape'd literal Surfaced from the source-mined optimization review (pygraphistry4 opportunity #6 — a flagged NO-CHEATING violation in the shipping polars lane). The old fallback test (which asserted the bridge worked) now asserts the honest NIE. TCK: no wrong-answer regression; ~39 scenarios that silently passed via the bridge now honestly decline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

….iloc crash) A bounded MATCH ... WITH <scalar> ... MATCH query crashed under engine='polars' with AttributeError: 'DataFrame' object has no attribute 'iloc' — the engine- agnostic re-entry broadcast (cypher/reentry/execution.py) used pandas .iloc / .assign / .drop(columns=) on a polars frame. Added engine-aware helpers (polars row(i, named=True) + with_columns(pl.lit(...)) / drop / head(0)) for the scalar-row extraction + constant-column broadcast. Re-entry now completes; a downstream RETURN the polars engine can't yet render raises honest NotImplementedError, not a crash. Surfaced by the TCK run (with2-1, with4-2, expr-typeconversion2/3/4-*). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ot validation error) A multi-clause OPTIONAL MATCH needing null-row fill (some seed rows unmatched) raised GFQLValidationError ('unsupported-cypher-query ... null-row alignment could not recover matched seed identities') under engine='polars' — the null-fill alignment (matched-id meta, .iloc row slicing, per-segment concat) is pandas-centric and the polars OPTIONAL MATCH doesn't populate the _cypher_entity_projection_meta ['ids'] it needs. Guarded the polars path to raise NotImplementedError instead (the honest 'not native yet' signal the TCK harness tolerates) — pandas runs these fine. Surfaced by the TCK run (match7-7, expr-graph4-4). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

OPTIONAL MATCH (n) RETURN n with no match rendered the absent whole entity as '()' under engine='polars' instead of null — the native entity-text expr didn't nullify absent rows (whose alias marker column is null). Now wraps the rendered text with pl.when(col(alias).is_null()).then(None) (mirrors pandas _nullify_missing_alias_rows); a real property-less node still renders '()'. Surfaced by the TCK run (match7-1). Regression-tested. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… (not == cast crash) A label match MATCH (n:Label) targeting the reserved 'labels' List column (a label with no one-hot label__X column: typed-schema unknown labels, OPTIONAL MATCH to a non-existent label) crashed under engine='polars' with InvalidOperationError: cannot cast List type to String — filter_by_dict_polars lowered it to a scalar == that tried to cast the List to String. Now uses pl.col(c).list.contains(val) for List-dtype columns: correct Cypher label-membership (Label in n.labels), empty for a non-existent label (matching pandas). Surfaced by the TCK run (match7-28, firstparty-typed-schema1-3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

NaN: comparisons over a NaN computed inside polars (0.0/0.0 > 1) used polars' semantics (NaN = largest value, NaN>1 True), but IEEE/Python/pandas/Neo4j-Cypher compare any NaN false (!= true). The expr lowering now masks float comparisons to the IEEE answer (& ~is_nan for < > <= >= =, | is_nan for <> !=), gated by conservative float-operand inference (via a free schema contextvar) so int/string/ bool comparisons are untouched and is_nan() never hits a non-float expr. Input NaN is already nan_to_null'd by pl.from_pandas, so this only affects in-query float math. Numeric-vs-string: comparing a number to a string (n.val > 'a', 0.0/0.0 > 'a') crashed with ComputeError: cannot compare string with numeric type. Detect the mismatch in both the expression path (lower_expr) and the folded filter-predicate path (filter_by_dict_polars) and raise honest NotImplementedError, not a crash. Surfaced by the TCK run (expr-comparison2-5-*, the 4-scenario NaN cluster). Regression-tested in test_engine_polars_cypher_conformance.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…aphic wrong-answer) Comparing cypher temporal values (time({...}) > time({...}), date < date) gave a WRONG answer under engine='polars': the cypher->gfql lowering renders the constructors to ISO strings ('10:00+01:00'), and the polars engine compared them LEXICOGRAPHICALLY — wrong across timezones/precision (pandas parses them temporally). The lowering now detects an ISO date/datetime/time string-literal operand in a comparison (specific regex; requires seconds-or-tz on bare times so ordinary '10:00' strings don't match) and raises honest NotImplementedError. Native temporal-typed comparison is the tracked proper fix. With this, the native polars engine has ZERO wrong-answers across the full Cypher TCK (3834 passed / 0 failed / 388 honest declines) — every scenario matches pandas or honestly declines. Surfaced by the TCK run (expr-temporal7). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…oral guards Multi-wave adversarial review of the session conformance fixes found 3 BLOCKERs (silent wrong-answers/panics, all NO-CHEATING violations) + IMPORTANTs: - BLOCKER: NaN guard missed int/int->Float division and function results (abs/ coalesce) -> polars NaN-as-largest leaked as wrong answers. Now drive the NaN + cross-type guards from the lowered exprs OUTPUT dtype (_expr_output_dtype, schema- only) instead of AST type inference — robustly catches division/functions. Replaces the three _infer_is_* helpers (DRY). - BLOCKER: list.contains was applied to ANY List column, so a user List property (n.tags = scalar) returned membership (wrong) vs pandas equality. Gated to the reserved labels column; other List columns decline honestly. - BLOCKER: numeric-vs-string nested in AllOf (x>20 AND x<z) or Between bypassed the cross-type guard and PANICKED (uncatchable Rust). _is_cross_type_predicate now recurses AllOf/Between. - IMPORTANT: Categorical/Enum columns now treated as string-like in both cross-type guards (categorical-vs-numeric was a raw ComputeError). - IMPORTANT: all-null columns (typed String by from_pandas) crashed on arithmetic (n.val + 1); cross-type guard now covers arithmetic ops, not just comparison. - IMPORTANT: ISO-temporal comparison guard narrowed to ORDERING of two temporal literals (was declining valid string-column-vs-date-literal compares; = and <> are lexicographically correct so not declined). - Anchored the temporal-constructor scan regex (no false-positive on update(...)). - Added the missing CHANGELOG entry (OPTIONAL MATCH null-fill decline) + streaming comment clarity. Full TCK still 3834 passed / 0 wrong-answers / 387 honest declines. 457 polars tests pass. 6 new adversarial regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… OTel span placement Adversarial review of the lower stack layers (#1648 polars engine, #1652 generic fast paths) found 2 bugs: - BLOCKER (#1648): a chain crashed under engine=polars with SchemaError when an edge endpoint dtype differed from the node-id dtype across int<->float (e.g. a null in a source/dest column -> float64 vs int64 ids) where pandas joins fine. The hop aligns join keys; the chain fast paths + combine did not. Added _align_edge_endpoints (cast endpoints to node-id dtype for the traversal, restore output dtype to match pandas; no-op when dtypes match) wired into the single-hop fast path + multi-hop. - (#1652): the gfql.chain OTel @otel_traced decorator had landed on the internal _try_chain_fast_path probe (inserted between the decorator and def chain) instead of the public chain() — chain() lost its span, span recorded wrong fn/attrs. Moved it. Both verified + regression-tested. 457 polars tests + 334 generic chain tests pass (4 fails are the pre-existing local libnvrtc CUDA-env issue, not these changes). Row-order divergences the review also found (fast path returns table order vs the full machinery BFS-discovery order, for reverse/undirected without ORDER BY — Cypher- undefined, sets/values identical, no repo test depends on it) are claim-precision, not bugs; CHANGELOG wording to be tightened in the per-layer #1648/#1652 review pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ype + DRY/docs Multi-dimension adversarial review (correctness/robustness/quality/docs) of the native polars engine found three reachable pandas-oracle divergences, now fixed (NO-CHEATING — match pandas or decline honestly): - BLOCKER: duplicate alias [n('a'), e(), n('a')] returned a malformed colliding- join schema (a/a_right) instead of raising; now raises GFQLValidationError E201 like pandas (node/edge aliases scoped separately, mirroring combine_steps). - BLOCKER: integer-literal division 5/2 lowered to polars true division (2.5) but Cypher folds to int division (2) — silent wrong order when embedded non- monotonically (ORDER BY n.val % (10/4)); now declines (NIE). Column / int (Float on both) unaffected. - IMPORTANT: internal start_nodes seed with a divergent id dtype (empty crossfilter -> float64 vs int64 node ids) crashed the combine join (SchemaError); now aligns the seed key (_align_seed_dtype), mirroring the hop + edge-endpoint alignment. Quality/docs: - Removed stale 'pandas bridge' docstrings/comments (row_pipeline, projection) — the bridge was removed in the de-cheat commit; the code raises NIE. - DRY: consolidated the cross-type/NaN dtype classifiers (numeric/int/float/ stringlike), duplicated 4x, into engine_polars/dtypes.py (the guard contract). - Aligned the lazy hop allowed_source guard textually with the eager hop (no-op: to_fixed_point is NIE'd upstream) to stop future eager/lazy drift. - Removed two dead imports in chain.py (hop_polars, Engine). - Documented a narrow filter_by_dict genuine-NaN residual (unreachable on the from_pandas ingestion path that null's NaN). +3 regression tests. 457 polars+chain tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lmeyerov mentioned this pull request Jun 25, 2026

feat(gfql): native Polars cypher row pipeline (PR2, stacked on #1648) #1649

Merged

lmeyerov force-pushed the dev/gfql-polars-engine branch from 82fb11c to 561cf37 Compare June 26, 2026 22:11

lmeyerov changed the base branch from master to dev/gfql-opt-base June 26, 2026 22:12

This was referenced Jun 26, 2026

perf(gfql): general optimizations base (parse memoization + temporal dtype-gate) [PR0] #1652

Open

perf(gfql): dtype-gate temporal-text detection to avoid spurious stringification (#1650) #1651

Closed

lmeyerov force-pushed the dev/gfql-polars-engine branch from 561cf37 to 0f37fed Compare June 26, 2026 22:25

lmeyerov changed the title ~~feat(gfql): native vectorized Polars engine for hop/chain (PR1 of 3)~~ feat(gfql): native CPU Polars engine — traversals + cypher row pipeline Jun 26, 2026

This was referenced Jun 26, 2026

feat(gfql): Polars-GPU engine (engine='polars-gpu', cudf_polars) [PR3, stacked on #1648] #1654

Closed

feat(gfql): polars-gpu = GPU target of the lazy Polars engine (collect-once) [redo of #1654] #1655

Open

lmeyerov changed the title ~~feat(gfql): native CPU Polars engine — traversals + cypher row pipeline~~ feat(gfql): native lazy Polars engine — collect-once traversals + cypher row pipeline Jun 27, 2026

lmeyerov force-pushed the dev/gfql-polars-engine branch 3 times, most recently from bc68d6b to 35f65b5 Compare June 27, 2026 17:35

lmeyerov and others added 7 commits June 27, 2026 11:10

lmeyerov force-pushed the dev/gfql-polars-engine branch from 35f65b5 to e9f29bd Compare June 27, 2026 18:12

lmeyerov and others added 2 commits June 27, 2026 11:24

lmeyerov and others added 6 commits June 28, 2026 00:20

lmeyerov and others added 9 commits June 28, 2026 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(gfql): native lazy Polars engine — collect-once traversals + cypher row pipeline#1648

feat(gfql): native lazy Polars engine — collect-once traversals + cypher row pipeline#1648
lmeyerov wants to merge 24 commits into
dev/gfql-opt-basefrom
dev/gfql-polars-engine

lmeyerov commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lmeyerov commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Traversals — hop() / chain()

Cypher row pipeline — MATCH … RETURN

Conformance hardening

Validation

Perf (interleaved, 1M nodes, each engine on its native-frame graph, all native)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lmeyerov commented Jun 25, 2026 •

edited

Loading

Traversals — `hop()` / `chain()`

Cypher row pipeline — `MATCH … RETURN`