fix: pair (schema, table) as composite in getTotalRowCount by veksen · Pull Request #126 · Query-Doctor/analyzer

veksen · 2026-04-21T16:25:27Z

Summary

getTotalRowCount deduplicated schemas and tables into separate arrays and paired them positionally via unnest($1, $2). For the common case of one schema with N tables, Postgres padded the shorter schema array with NULL, dropping every table past the first from the JOIN — so SUM(reltuples) came from a single table.
In production this caused a 300K-row table to report as 77 rows (the first table in the list was a small audit table), pushing decideStatsStrategy through the 10k-row default instead of dumpSourceStats and silently degrading optimization for the whole source.
Fix: match composite (nspname, relname) pairs from position-aligned non-deduplicated arrays via = ANY(SELECT s, t FROM unnest(...)), so every requested table contributes exactly once.

Test plan

Five testcontainer integration tests in src/sync/pg-connector.test.ts, each guarding a specific incorrect implementation shape. Every test asserts the exact expected sum (toBe(...)) — reltuples after ANALYZE on uniformly-inserted rows is exact at these sizes (all tables below the sampling threshold).

single schema, multi-table — direct repro of the Slack bug. Verified to fail on pre-fix code with expected 77 to be 300077.
uneven table counts across schemas — 3 tables (1K/2K/4K) across 2 schemas, where the pre-fix dedup shrinks schemaNames below tableNames and NULL-pads the tail. Verified to fail on pre-fix code with expected 1000 to be 7000.
request-set scoping — a 500K unwanted table in the same schema must not leak into the sum.
composite pair vs. cross-product — guards against a cartesian re-implementation (n.nspname = ANY($1) AND c.relname = ANY($2)). With public.{x,y} and reporting.{x,y} all present but only (public.x, reporting.y) requested, a cartesian impl would wrongly count public.y and reporting.x. Verified: cartesian variant yields 130300 instead of 300.
duplicate input — passing the same (schema, table) pair twice must not double-count. Guards against a JOIN unnest(...) re-implementation that multiplies each catalog row by the number of matching input rows. Verified: JOIN variant yields 10000 instead of 5000.
npx tsc --noEmit clean.
All 7 tests pass in src/sync/pg-connector.test.ts.

Out of scope (separate finding)

PgIdentifier.toString() applies quote_ident semantics, so a mixed-case table name like "Foo" produces the quoted form "Foo" with double quotes. pg_class.relname stores the raw unquoted name (Foo), so getTotalRowCount will miss any mixed-case or reserved-word table. This is pre-existing behavior, not a regression from this fix — worth a separate ticket if it matters in practice.

🤖 Generated with Claude Code

The previous query deduplicated schemas and tables into separate arrays and then paired them via unnest($1, $2). When one array was shorter (common case: one schema, many tables), Postgres padded the shorter array with NULL, dropping every table past the first from the JOIN. This silently returned the reltuples of a single table (often a small audit/config one), causing decideStatsStrategy to fall through to the 10k-row default for sources that should have used real source stats. Fix: match composite (nspname, relname) pairs from two position-aligned (non-deduplicated) arrays, so every requested table contributes to the sum. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

Query Doctor Analysis

View full run details

2 queries analyzed

2 pre-existing issues

SELECT "guests"."id", "guests"."session_id", "guests"."username", "guests"."avatar_path", "guests"."color", "guests"."side", "guests"."audio_recording_path", "guests"."audio_recording_public", "gue...
index assets(event_id, inserted_at desc)
cost 31,003,449 → 1,493 (100% reduction)
SELECT * FROM guest_ip_addresses WHERE ip_address = '127.0.0.1';
index guest_ip_addresses(ip_address)
cost 154,402 → 8 (100% reduction)

_{Using assumed statistics (10000000 rows/table). For better results, sync production stats.}

The previous multi-schema test paired one schema to one table, which the pre-fix dedup happened to get right (both arrays length 2 → aligned unnest). Replaces it with an uneven shape — two tables in public plus one in reporting — where the dedup shrinks the schema array below the table array, triggering the NULL-padding bug. Also drops the empty-list test, which only exercised the length==0 guard clause. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds two more getTotalRowCount tests beyond the dedup-pairing bug: 1. Composite-pair match vs. cross-product: with public.{x,y} and reporting.{x,y} all present and only (public.x, reporting.y) requested, a cartesian implementation (n.nspname = ANY($1) AND c.relname = ANY($2)) would wrongly count public.y and reporting.x. Verified to catch this shape — the cartesian variant yields 130300 instead of the expected 300. 2. Duplicate input: passing the same (schema, table) pair twice must not double-count. A JOIN-based fix against an unnest'd set would multiply each catalog row by the number of matching input pairs. Verified to catch this shape — JOIN yields 10000 instead of 5000. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

reltuples after ANALYZE on uniformly-inserted rows is exact when the table is below the sampling threshold (~30K pages), which covers every test table here. Exact sums are self-documenting (they spell out the expected arithmetic) and catch additional regressions — e.g. a wrong COALESCE default, miscounted column, or reltuples-vs-relpages mixup — that the looser `>=` / `<` bounds silently tolerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot reviewed Apr 21, 2026

View reviewed changes

veksen and others added 3 commits April 21, 2026 20:38

veksen requested a review from Xetera April 21, 2026 17:06

Xetera approved these changes Apr 21, 2026

View reviewed changes

veksen merged commit 686a6b5 into main Apr 21, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pair (schema, table) as composite in getTotalRowCount#126

fix: pair (schema, table) as composite in getTotalRowCount#126
veksen merged 4 commits intomainfrom
veksen/fix-get-total-row-count

veksen commented Apr 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

veksen commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Out of scope (separate finding)

Uh oh!

github-actions Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Query Doctor Analysis

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

veksen commented Apr 21, 2026 •

edited

Loading

github-actions Bot left a comment •

edited

Loading