feat(gfql #1650): structured flattened whole-entity returns by lmeyerov · Pull Request #1656 · graphistry/pygraphistry

lmeyerov · 2026-06-27T04:01:14Z

Summary

Closes the core of #1650. Terminal Cypher RETURN a (whole node/edge) previously emitted one column of Cypher display strings built row-wise (({id: 51, val: 51, kind: 'a'})). That string is a presentation format (it matches the cypher-shell / TCK oracle), not data — callers had to re-parse it to use it, and constructing it is O(rows).

This flattens whole-entity returns into structured {alias}.{field} columns (a.id, a.val, a.kind, ...) by default. The per-field columns already exist on the working frame before projection, so this is "stop collapsing", not "rebuild": near-free, lossless, directly usable, and it survives JSON / CSV / Parquet / Arrow serialization and plot().

Performance (dgx-spark, median-of-7, `RETURN a` vs old text form)

	pandas @10k	pandas @100k	cuDF @10k	cuDF @100k
flat (#1650)	20.5 ms	32.1 ms	19.8 ms	26.6 ms
old text form	41.9 ms	204.2 ms	53.3 ms	113.5 ms
speedup	2.0×	6.4×	2.7×	4.3×

Win grows with row count (text render is O(rows); flat is ~free).

Design

apply_result_projection(..., structured=True) emits flat columns for whole-entity returns; structured=False keeps the legacy single Cypher-display-string column. The OPTIONAL-MATCH null-fill / projection row-guard paths (which still consume a single-column entity value for row alignment) opt out via this flag and are unchanged.
A synthesized null/absent-entity row (top-level OPTIONAL-MATCH miss or OPTIONAL WITH-reentry no-match, built by _apply_empty_result_row as a single {alias: None} column) has no field columns to flatten, so it falls back to the single-column text form — rendering to None and preserving the shape the OPTIONAL / reentry machinery consumes for identity recovery and no-match detection. Real rows always carry flat fields and flatten.
Text is now presentation-only: render_entity_text(result, alias) reconstructs the Cypher display string on demand (used by the conformance/TCK driver and any caller wanting the human-readable form). The structured data path never pays the render cost.
The entity-projection meta ids snapshot (.copy()) is retained — bounded reentry recovers carried node identities from it and must not alias the live frame ([BUG] Cypher reentry path mis-handles OPTIONAL prefix MATCH on no-match fixtures #1356).

Behavior change

Callers that previously read the rendered Cypher display string from a terminal RETURN a column now receive flattened a.* columns. Documented under [Development] › Changed in CHANGELOG. No programmatic consumer of the display string was found (graphistry server / louie serialize to flat JSON/CSV/Parquet).

Tests

Whole-entity text assertions migrated to a entity_text_records shim that renders flat → text for comparison against the pre-GFQL: avoid spurious entity-text stringification of returned entities (return structured/Arrow frames) #1650 Cypher-text oracle.
Grouping / connected-optional / null_fill paths (still single-column text) keep direct text assertions.
Flat-shape + render-helper + projection-meta tests added.
Full graphistry/tests/compute/gfql/ on dgx-spark (cuDF 25.12 container): 2509 passed, 16 skipped, 15 xfailed. The only 2 failures are pre-existing container-environment artifacts unrelated to this change (image's stale baked setup.py; a cugraph test that asserts "without cugraph" in an image that has cugraph).
ruff-clean; mypy-clean for changed code.

Follow-ups (separate PRs, after this lands)

tck-gfql conformance adapter: reconstruct the Cypher display string from flattened a.* columns at the comparison hook (paired-contract).
pyg-bench probes: durable RETURN a structured-vs-text + where_rows micro-probes.
Optional polish: unify the intermediate grouping/reentry value-eval onto structured (flat everywhere). Not required for the perf win; tracked as deferred.

🤖 Generated with Claude Code

…eat queries) parse_cypher's lark parse+transform is the dominant per-call cost of small cypher queries — ~50% of a cypher call at 100k rows (PR7b layer profiling). It is a pure function of the query text returning an immutable frozen AST, so memoize it (LRU 512): repeated identical queries skip the parse, ~1.3–1.7x faster end-to-end at small/interactive sizes across pandas/polars/cudf. Safe: every cypher AST node is @DataClass(frozen=True) and compile_cypher_query does not mutate the parsed tree (verified); the non-str/empty validation guard stays outside the cache so error behavior is unchanged and errors are not cached. - split validating guard (parse_cypher) from cached body (_parse_cypher_cached) - full gfql suite green on dgx: 2875 passed, 0 failed; +2 cache regression tests NOTE: independent of the polars engine (base cypher code) — cherry-pickable to its own PR off master. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@10k

…ngification (#1650) `order_detect_temporal_mode` ran `astype(str)` + up to 6 regex `fullmatch` passes on BOTH operands of every comparison, including numeric/bool columns that can never hold temporal *text*. This fired ~160k spurious `re.fullmatch` on a 10k-row `where_rows(val > 50)` whose output is a plain table — pure waste. Gate: early-return None when `dtype.kind in {i,u,f,b,c}`. Object/string/datetime columns are unaffected, so temporal-text detection is preserved. Byte-identical output. Measured `where_rows` speedups: - pandas @10k: 48.9 -> 15.6 ms (3.1x) - cuDF @10k: 37.4 -> 8.5 ms (4.4x) - cuDF @100k: 191.0 -> 14.4 ms (13.3x, 1M edges) Tests: pandas 91 + cuDF 83 temporal/ordering/row suites pass; adds unit coverage for the gate (numeric/bool skip, object temporal still detected) and an end-to-end numeric `where_rows` filter assertion. Scope: this is the spurious-path half of #1650. Whole-entity `RETURN a` text rendering (structured/Arrow returns) is tracked separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…on scans Follows #1650/#1651 (which gated temporal-text detection). Profiling numeric RETURN/ORDER BY queries @100k showed ~40–48% of the call was STILL spurious `astype(str)` + regex on int/float/bool columns — not in temporal detection (already gated) but in the LIST-detection + projection scans: - order_detect_list_series / order_detect_stringified_list_series (ordering.py) - RowPipelineMixin._gfql_series_is_list_like (pipeline.py) - _project_property_column temporal scan (result_postprocess.py) All run `series.astype(str)` to test for list/temporal *text* on every column, including numeric/bool/complex columns that can never hold it. Gate them with a shared `_is_non_listable_dtype` (kind in i,u,f,b,c) → early return False/series. Byte-identical (the scans return False/None for these dtypes anyway); object/ string/datetime columns are untouched, preserving list/temporal detection. Result (where the projection/order path runs over numeric cols): ORDER BY and arithmetic-projection drop their ~50–100ms spurious string render to ~0. + tests: numeric/bool dtype pass-through; behavioral numeric ORDER BY (would fail if values were lexically stringified). Base cypher/row code -> PR0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@1m

…chain isin() is set-membership, so isin(s) == isin(s.unique()) — the explicit .unique()/dedup pass is redundant. Removed at _filter_edges_by_endpoint, the undirected combine masks, and the per-hop wavefront filter. Each removed .unique() is one fewer kernel launch on GPU (where launch latency, not compute, dominates small/mid traversals). Byte-identical: verified across 345 adversarial graph×query cases (dup/parallel edges, self-loops, isolated/dead-end nodes, cycles, undirected, multi-hop, fixed-point, min/max hops, names, filters, seeds) + the hop/chain test suites. cuDF 1-hop chain ~126→103 ms @1m (~18%). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@1m

A single MATCH (n) (no edge hop) — the dominant tabular/crossfilter shape (node filter, histogram, table search) — returns the filtered node table directly + empty edges, skipping the generic forward/backward/combine BFS. Helps pandas (default engine) + cuDF. Byte-identical (345-case adversarial golden + hop/chain suites, 240 tests). ~100x faster pandas node filter at 10M (204->2ms), cuDF stays on the resident frame (~0.8ms @1m). The 1-hop shape is NOT generic-fast-pathed: the full machinery's node merge upcasts int->float (pandas artifact), so a join-free generic 1-hop would change dtypes; the polars engine has its own dtype-consistent 1-hop fast path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Extend the generic node-only fast path to the simple 1-hop MATCH (a {f})-[e]->(b) (the basic graph query + "filter then expand" crossfilter): return the edges whose endpoints pass the node filters + those endpoint nodes, skipping the forward/backward/combine BFS. Gated to pandas/cuDF (dask/spark keep the full path; polars has its own). Same values + node/edge sets (345-case adversarial golden = dtype-only diffs, values identical; no test regressions vs pristine). Behavior change (intentional, more correct): the 1-hop now PRESERVES node-attr dtypes (int stays int) instead of the full machinery's spurious int->float merge upcast — making pandas/cuDF consistent with the polars engine (which already preserved int). cuDF basic graph query now ~2 semi-joins on the resident frame instead of the BFS + ~31 drop_duplicates, capturing the GPU semijoin win. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The generic _try_chain_fast_path short-circuited chain() before the policy hook dispatch, so a query that hits the fast path (node-only MATCH, single-hop) never fired prechain/postchain (and skipped per-op policy inspection) — observable behavior change vs the pre-fast-path engine. Gate the fast path on no-policy; policy-bearing queries take the full path. 64 policy tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tructured returns Squashed reconciliation of the native lazy Polars GFQL engine (was #1648's 28 commits; full history preserved at tag bak/1648) restacked onto the colleague's #1656 structured whole-entity returns + #1657 parse_expr memo. Engine: native polars hop/chain (semi/anti joins), native cypher row pipeline (select/where/order_by/group_by/unwind/projection), lazy single-hop collect-once with CPU/GPU execution targets (gfql/lazy/). NO pandas bridge — native or honest NotImplementedError (plan.md NO-CHEATING). Reconciliation with #1650 structured returns: apply_result_projection now threads `structured` to the polars path (apply_result_projection_polars). Whole-entity RETURN a flattens to {alias}.{field} columns natively (mirrors the pandas _flat_entity_field_names selection exactly), which — unlike the legacy entity-text expr — works for ANY dtype (float/temporal/nested just become columns), so polars structured == pandas structured across the board. structured=False still renders the native Cypher display string for int/string/bool single-entity nodes. _include_numeric_id_as_property is now polars-aware so id flattens identically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per the #1656 author's handoff: the elif-structured single-column text fallback in _apply_result_projection_pandas looks redundant but fixes two regressions (top-level OPTIONAL-MATCH miss; OPTIONAL-WITH-reentry no-match). Mark DO NOT REMOVE so a later 'tidy' doesn't reintroduce them. Our polars structured-returns reconciliation touched this file; verified the fallback is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tructured returns Squashed reconciliation of the native lazy Polars GFQL engine (was #1648's 28 commits; full history preserved at tag bak/1648) restacked onto the colleague's #1656 structured whole-entity returns + #1657 parse_expr memo. Engine: native polars hop/chain (semi/anti joins), native cypher row pipeline (select/where/order_by/group_by/unwind/projection), lazy single-hop collect-once with CPU/GPU execution targets (gfql/lazy/). NO pandas bridge — native or honest NotImplementedError (plan.md NO-CHEATING). Reconciliation with #1650 structured returns: apply_result_projection now threads `structured` to the polars path (apply_result_projection_polars). Whole-entity RETURN a flattens to {alias}.{field} columns natively (mirrors the pandas _flat_entity_field_names selection exactly), which — unlike the legacy entity-text expr — works for ANY dtype (float/temporal/nested just become columns), so polars structured == pandas structured across the board. structured=False still renders the native Cypher display string for int/string/bool single-entity nodes. _include_numeric_id_as_property is now polars-aware so id flattens identically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per the #1656 author's handoff: the elif-structured single-column text fallback in _apply_result_projection_pandas looks redundant but fixes two regressions (top-level OPTIONAL-MATCH miss; OPTIONAL-WITH-reentry no-match). Mark DO NOT REMOVE so a later 'tidy' doesn't reintroduce them. Our polars structured-returns reconciliation touched this file; verified the fallback is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tructured returns Squashed reconciliation of the native lazy Polars GFQL engine (was #1648's 28 commits; full history preserved at tag bak/1648) restacked onto the colleague's #1656 structured whole-entity returns + #1657 parse_expr memo. Engine: native polars hop/chain (semi/anti joins), native cypher row pipeline (select/where/order_by/group_by/unwind/projection), lazy single-hop collect-once with CPU/GPU execution targets (gfql/lazy/). NO pandas bridge — native or honest NotImplementedError (plan.md NO-CHEATING). Reconciliation with #1650 structured returns: apply_result_projection now threads `structured` to the polars path (apply_result_projection_polars). Whole-entity RETURN a flattens to {alias}.{field} columns natively (mirrors the pandas _flat_entity_field_names selection exactly), which — unlike the legacy entity-text expr — works for ANY dtype (float/temporal/nested just become columns), so polars structured == pandas structured across the board. structured=False still renders the native Cypher display string for int/string/bool single-entity nodes. _include_numeric_id_as_property is now polars-aware so id flattens identically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per the #1656 author's handoff: the elif-structured single-column text fallback in _apply_result_projection_pandas looks redundant but fixes two regressions (top-level OPTIONAL-MATCH miss; OPTIONAL-WITH-reentry no-match). Mark DO NOT REMOVE so a later 'tidy' doesn't reintroduce them. Our polars structured-returns reconciliation touched this file; verified the fallback is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tructured returns Squashed reconciliation of the native lazy Polars GFQL engine (was #1648's 28 commits; full history preserved at tag bak/1648) restacked onto the colleague's #1656 structured whole-entity returns + #1657 parse_expr memo. Engine: native polars hop/chain (semi/anti joins), native cypher row pipeline (select/where/order_by/group_by/unwind/projection), lazy single-hop collect-once with CPU/GPU execution targets (gfql/lazy/). NO pandas bridge — native or honest NotImplementedError (plan.md NO-CHEATING). Reconciliation with #1650 structured returns: apply_result_projection now threads `structured` to the polars path (apply_result_projection_polars). Whole-entity RETURN a flattens to {alias}.{field} columns natively (mirrors the pandas _flat_entity_field_names selection exactly), which — unlike the legacy entity-text expr — works for ANY dtype (float/temporal/nested just become columns), so polars structured == pandas structured across the board. structured=False still renders the native Cypher display string for int/string/bool single-entity nodes. _include_numeric_id_as_property is now polars-aware so id flattens identically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per the #1656 author's handoff: the elif-structured single-column text fallback in _apply_result_projection_pandas looks redundant but fixes two regressions (top-level OPTIONAL-MATCH miss; OPTIONAL-WITH-reentry no-match). Mark DO NOT REMOVE so a later 'tidy' doesn't reintroduce them. Our polars structured-returns reconciliation touched this file; verified the fallback is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The generic single-hop fast path accepted edges with `prune_to_endpoints=True` (a public e()/e_forward()/e_reverse() kwarg) but returned BOTH endpoints, while the flag keeps only the arrival side (dest for forward, src for reverse). So `[n(), e_forward(prune_to_endpoints=True), n()]` silently returned the wrong node/edge set (verified: path 0→1→2→3 gave [0,1,2,3] vs correct [1,2,3]). Add the flag to the fast-path edge gate so it declines and falls through to the full path, which honors it. `is_simple_single_hop()` is intentionally left unchanged (its other callers depend on the current semantics) — the gate is the correct, surgical layer. Also document the deep-immutability invariant on the Cypher AST nodes: parse_cypher results are shared by reference via lru_cache, so a future mutable field would poison cache hits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Code-quality pass on the opt-base fast path: - Type the contract boundaries: _filter_edges_by_endpoint(edges_df: DataFrameT, nodes_df: Optional[DataFrameT], ...) -> DataFrameT; _try_chain_fast_path(g_in: Plottable, ops: List[ASTObject], engine_concrete: Engine, start_nodes: Optional[DataFrameT]). - Replace getattr(n0, "query", None) duck-typing with n0.query (ASTNode declares it; the isinstance(n0, ASTNode) guard narrows the type). - Convert Engine -> EngineAbstract once (engine_abs) for the filter_by_dict calls (its param is EngineAbstract|str), matching the existing materialize conversion. - Trim verbose comments to terse one/two-liners (fast-path docstring, .unique removal, prune gate, policy gate, temporal dtype gate, AST immutability invariant). No behavior change; mypy + ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@100

Complements the Cypher whole-query parse memo in PR #1652. Once parse_cypher is cached, profiling the residual fixed per-call compile cost showed it is dominated by parse_expr() — the GFQL row-expression parser, invoked ~4x per query compile (each RETURN/WHERE/WITH expression) and, per call, rebuilding a Lark transformer that defines a frozen dataclass via runtime exec (dataclasses._process_class / _create_fn / exec churn). parse_expr() is a pure function of the expression string (no params/schema) and returns a tree of frozen dataclasses (17 frozen node types, tuple-valued fields, immutable). So identical expressions — re-parsed on every compile, and recurring across queries (e.g. `a.val > 50`) — are memoized via lru_cache(maxsize=1024). Non-str/empty guard stays outside the cache; only successful parses are cached. No source consumer outside graphistry/compute/gfql calls parse_expr, and nothing bypasses frozen-ness (no object.__setattr__). Measured (dgx-spark, median-of-9): stacked on the query-parse memo, the fixed per-call cost of a repeated string Cypher query drops a further ~4.5 ms -> ~1.8 ms (RETURN a @100 rows: ~6.3 -> 3.6 ms) — near parity with the equivalent native chain. Touches only expr_parser.py (+ its test) — disjoint from PR #1652's parser.py, so the two apply independently in either order. Tests: 3 focused expr-cache tests (memoization identity, distinct + hit registration, invalid non-caching). Full graphistry/tests/compute/gfql/: 2512 passed, 16 skipped, 15 xfailed; only 2 unrelated in-container artifacts fail (networkx setup.py packaging; a cugraph "without cugraph" shortest-path test in an image that has cugraph). ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds a regression test for the fix in 2755744 (fast path must not skip policy hooks): asserts that for the exact fast-path-eligible shapes (`[n()]` and `[n(),e(),n()]`) the prechain/postchain/postload hooks all fire when a policy is installed, and that the fast path is still taken when no policy is present. Without a guard this regression was silent (the shapes that hit the fast path are precisely the ones the policy suite exercises). Also documents the policy-gate in the fast-path CHANGELOG entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The chain fast path (_try_chain_fast_path) shipped with only incidental coverage (existing hop/chain/golden suites); its core behaviors were never pinned directly. Adds focused, engine-parametrized locks: - test_fast_path_differential_parity_vs_full_path: fast path output == full (policy-forced BFS) path output by node/edge SET, across every accelerated shape (node-only, node-filter, predicate, fwd/rev/undirected 1-hop, src/dst/both-end filters) AND every bypass shape (hops=2, filtered-undirected, edge_match, named node). Uses an installed no-op policy as the equivalence oracle (any policy forces the full path). - test_fast_path_preserves_int_node_dtypes: hard-locks the feature promise (1-hop fast path keeps int node attrs as int); tolerant on the full path. - test_fast_path_gating_returns_none_for_ineligible: unit-level accept/decline contract for the gate (seeds, non-eager engines, edge_match, named/queried nodes, hops>1, 2-op chains all decline). - _gfql_series_is_list_like dtype-gate: direct unit coverage (numeric/bool short-circuit; real list/tuple objects still detected; stringified lists intentionally deferred to order_detect_stringified_list_series). pandas lanes run everywhere; cuDF lanes gated on TEST_CUDF=1 (dgx-spark). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The new _try_chain_fast_path helper was inserted between the @otel_traced("gfql.chain") decorator and def chain(), so the decorator silently migrated onto the fast-path helper: chain() stopped emitting its span entirely, and _chain_otel_attrs (which expects chain's signature) was invoked with the fast path's positional args — binding `gfql.validate_schema` to the start_nodes DataFrame. Only surfaces with OTEL enabled (attrs are computed lazily), so CI stayed green. Move the decorator back onto chain(). Adds test_chain_otel_span_attrs_mapped_correctly: enables otel via monkeypatch, captures the span, and asserts chain_len + that validate_schema is a bool (fails under the misplaced decorator, passes with it on chain()). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The #1650/#1651 dtype gate (numeric/bool/complex columns skip the spurious astype(str)+regex list/temporal scan) was duplicated across 4 sites with 3 different sentinel conventions ("O", None, factored). Consolidate the kind set into is_non_textual_scalar_dtype() in series_str_compat.py (the engine-poly Series-string layer these gates short-circuit) and call it from ordering.py (x2), pipeline.py, and cypher/result_postprocess.py. Byte-identical; adds a direct parametrized unit test for the helper across dtype kinds + None. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The 1-hop fast path took `edges = g._edges` wholesale and only intersected filtered endpoints with the node table for the NODE output — it never restricted the EDGES to those whose endpoints exist in the node table. The full BFS path enforces that implicitly via its edge<->node joins, so the two diverged whenever a supplied node table omits an edge endpoint (filtered/subset node frames, or nodes+edges loaded separately): nodes v=[0,1], edges (s,d)=[(0,1),(1,99)] # 99 absent [n(), e_forward(hops=1), n()] fast kept (1,99); full dropped it [n({'attr':2}), e_forward, n()] fast -> nodes[1] edges[(1,99)]; full -> EMPTY Restrict edges to both-endpoints-present before the endpoint filters, and dedup node ids on the result (the full path collapses dup rows via its merge). Now fast == full across dangling-endpoint and duplicate-id graphs; no change on well-formed graphs. Found by wave-2 differential review; tests added for both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The wave-2 both-endpoints-present restriction used `.isin(node_ids)`, but pandas/cuDF `.isin` treat NaN as a matchable value (`NaN.isin([NaN])` is True). So a NaN node id "validated" a NaN edge endpoint and the fast path kept a dangling NaN edge that the full BFS path's joins (which never match NaN<->NaN) correctly drop. dropna() on node_ids before the isin restores convergence for forward/reverse on NaN-id graphs; well-formed and non-null-dangling graphs are unchanged. (NaN ids are malformed input; the full path's undirected branch is itself inconsistent there, so we match its directed/consistent behavior.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s, dup ids, NaN) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…d; expr AST immutability note Hardening tests for the fast-path + dtype-gate optimizations: - prune_to_endpoints: add to the gating-decline list and as bypass shapes (fwd+rev) in the fast-vs-full differential — regression guard for the fast-path prune gate. - NaN endpoints: new test_fast_path_drops_nan_endpoint_edges pins that a NaN node id does not validate a NaN edge endpoint (the .dropna() guard), matching the full BFS path. - Tighten the differential test: bypass shapes now also assert they DECLINE the fast path (previously full-vs-full, vacuous for bypass cases). - Drift guard: assert filter_by_dict._is_numeric_dtype_safe agrees with the series_str_compat._NON_TEXTUAL_SCALAR_KINDS SSOT across all numpy dtype kinds (the two keep separate copies to avoid an import cycle; they must not desync). - cuDF lane for the numeric where_rows dtype-gate path (paired-coverage convention). Also document the deep-immutability invariant on ExprNode (parse_expr lru_cache shares results by reference). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…te (#1657) Code-quality pass on the parse-cache layer: - Condense the verbose dropna/dangling-edge and dup-node fast-path comments to terse one/two-liners; same for the SSOT dtype-kind note and the ExprNode immutability invariant. - Note in the SSOT comment that filter_by_dict's mirror is kept in sync by a test (not imported) to avoid an import cycle. No behavior change; mypy + ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Terminal Cypher `RETURN a` (whole node/edge) previously emitted one column of Cypher display strings (`({id: 51, val: 51, kind: 'a'})`) built row-wise. The string is a *presentation* format (it matches the cypher-shell / TCK oracle), not data — callers had to re-parse it to use it, and constructing it is O(rows). This flattens whole-entity returns into structured `{alias}.{field}` columns (`a.id, a.val, a.kind`, ...) by default. The per-field columns already exist on the working frame before projection, so this is "stop collapsing", not "rebuild": near-free, lossless, directly usable, and it survives JSON / CSV / Parquet / Arrow serialization and `plot()`. Measured (dgx-spark, median-of-7, RETURN a vs old text form): pandas @100k 32 vs 204 ms (6.4x); cuDF @100k 27 vs 114 ms (4.3x). Win grows with row count (text render is O(rows); flat is ~free). Design: - `apply_result_projection(..., structured=True)` emits flat columns for whole-entity returns; `structured=False` keeps the legacy single Cypher-display-string column. The OPTIONAL-MATCH null-fill / projection row-guard paths (which still consume a single-column entity value for row alignment) opt out via this flag and are unchanged. - A synthesized null/absent-entity row (top-level OPTIONAL-MATCH miss or OPTIONAL WITH-reentry no-match, built by `_apply_empty_result_row` as a single `{alias: None}` column) has no field columns to flatten, so it falls back to the single-column text form — rendering to None and preserving the shape the OPTIONAL / reentry machinery consumes for identity recovery and no-match detection. Real rows always carry flat fields and flatten. - Text is now presentation-only: `render_entity_text(result, alias)` reconstructs the Cypher display string on demand (used by the conformance / TCK driver and any caller wanting the human-readable form). The structured data path never pays the render cost. - The entity-projection meta `ids` snapshot (`.copy()`) is retained — bounded reentry recovers carried node identities from it and must not alias the live frame (#1356). Tests: whole-entity text assertions migrated to a `entity_text_records` shim that renders flat -> text for comparison against the pre-#1650 Cypher-text oracle; grouping / connected-optional / null_fill paths (still single-column text) keep direct text assertions; flat-shape + render-helper + meta tests added. gfql/cypher + row suites: 1646 passed, 15 xfailed (only the unrelated in-container networkx setup.py packaging artifact fails). Cross-repo follow-ups (separate, after this lands): tck-gfql conformance adapter (structured -> text at the comparison hook) and pyg-bench probes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rm (#1650) Structured whole-entity returns (#1650) changed terminal RETURN a from the display string '(:person)' to flattened a.* columns. Two test_gfql.py tests still asserted the old display-string form (missed when the behavior landed); update them to the flattened columns. The behavior itself is correct + documented in CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…policy-hook guard test) Root-cause fix for a real bug surfaced while hardening the fast-path/policy work: A chain/Cypher query over an edges-only graph (no node-id binding, `g._node is None`) takes the full traversal path — the node-only/single-hop fast path is skipped whenever an edge-id index must be synthesized (`added_edge_index`), or when a policy is attached. That path rebuilt `g_out` from `self` (the *unbound* input) to restore the original edge binding, but `self._node is None`, so the materialized node-id binding was dropped. The endpoint-reconciliation concat then synthesized a spurious `None`-named node column — a corrupt result on older pandas, and a hard `NotImplementedError` (void-block NA fill) on newer pandas (Python 3.14). Fix: carry the materialized `g._node` binding explicitly (`self.nodes(final_nodes_df, g._node)`) while still restoring the original edge binding. Full-path output now matches the fast path: correct single node column, valid `_node`, edge binding preserved. Pre-existing on master (not introduced by this PR's fast paths). Also adds a parametrized regression test pinning that BOTH chain fast-path shapes (node-only `[n()]` and single-hop `[n(),e(),n()]`) still fire `postload` under a policy AND return a valid, non-corrupt result on an edges-only graph — guarding both the policy-hook gate (2755744) and this node-binding fix. CHANGELOG: documents both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…on; doc + tests + cuDF lane - I1 (bug): `RETURN a, a.val` emitted a duplicate `a.val` column (the whole-entity flatten shares the `{alias}.{field}` namespace with the explicit property projection). Duplicate column names break selection and silently drop data on `to_dict`/serialization. De-dup the output columns (identical data — dotted aliases are rejected), keeping first occurrence. - I2 (boundary): document that a whole entity with no flattenable field (no id binding, no props, no type — in practice only an edge with no edge-id binding) falls back to the single Cypher-display-text column (value correct, e.g. `[]`); nodes always carry an id and flatten. Pinned by a test; nodes-immune noted. - Docs: new "Whole-Entity RETURN Output Shape" section in cypher.rst (flat columns, render_entity_text helper, dedup + no-field boundary). - Tests: dup-column + no-field regression tests; cuDF lane for the edges-only/policy fast-path shapes test (chain.py is on the cuDF-pairing list). - CHANGELOG: I1 Fixed entry; structured-returns entry notes the no-field boundary; edges-only Fixed wording corrected (the fast path isn't gated on edge-id synthesis). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Code-quality pass on the structured-returns layer: - render_entity_text reads result._nodes directly (typed Plottable attr) instead of getattr duck-typing. - Condense verbose comments to terse one/two-liners: structured/absent emission branch, the dedup rationale, the temporal dtype gate, the OPTIONAL opt-out note, and the edges-only node-binding rebuild. No behavior change; mypy + ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lmeyerov and others added 3 commits June 26, 2026 14:48

lmeyerov mentioned this pull request Jun 27, 2026

perf(gfql): memoize row-expression parser (parse_expr) #1657

Open

lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from a27ddf4 to f5f325a Compare June 27, 2026 15:11

lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from f5f325a to d6903ae Compare June 27, 2026 16:35

lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from d6903ae to e946275 Compare June 27, 2026 17:35

lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from e946275 to 12db9a3 Compare June 27, 2026 18:12

lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from 12db9a3 to eca4217 Compare June 28, 2026 20:00

fix(gfql): preserve policy hooks in chain fast path

c0a61e8

lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from c40b69c to 7d77ce7 Compare June 28, 2026 20:43

lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from 7d77ce7 to bd75bf9 Compare June 28, 2026 20:52

lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from bd75bf9 to a83a59b Compare June 28, 2026 21:59

lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from a83a59b to 231526a Compare June 28, 2026 23:19

lmeyerov and others added 3 commits June 28, 2026 17:08

lmeyerov and others added 13 commits June 28, 2026 17:08

docs(changelog): note fast-path correctness guarantees (dangling edge…

481e3ab

…s, dup ids, NaN) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from 231526a to b8c169e Compare June 29, 2026 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(gfql #1650): structured flattened whole-entity returns#1656

feat(gfql #1650): structured flattened whole-entity returns#1656
lmeyerov wants to merge 25 commits into
masterfrom
dev/gfql-1650-structured-returns

lmeyerov commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lmeyerov commented Jun 27, 2026

Summary

Performance (dgx-spark, median-of-7, RETURN a vs old text form)

Design

Behavior change

Tests

Follow-ups (separate PRs, after this lands)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Performance (dgx-spark, median-of-7, `RETURN a` vs old text form)