Skip to content

fix: pair (schema, table) as composite in getTotalRowCount#126

Merged
veksen merged 4 commits intomainfrom
veksen/fix-get-total-row-count
Apr 21, 2026
Merged

fix: pair (schema, table) as composite in getTotalRowCount#126
veksen merged 4 commits intomainfrom
veksen/fix-get-total-row-count

Conversation

@veksen
Copy link
Copy Markdown
Member

@veksen veksen commented Apr 21, 2026

Summary

  • getTotalRowCount deduplicated schemas and tables into separate arrays and paired them positionally via unnest($1, $2). For the common case of one schema with N tables, Postgres padded the shorter schema array with NULL, dropping every table past the first from the JOIN — so SUM(reltuples) came from a single table.
  • In production this caused a 300K-row table to report as 77 rows (the first table in the list was a small audit table), pushing decideStatsStrategy through the 10k-row default instead of dumpSourceStats and silently degrading optimization for the whole source.
  • Fix: match composite (nspname, relname) pairs from position-aligned non-deduplicated arrays via = ANY(SELECT s, t FROM unnest(...)), so every requested table contributes exactly once.

Test plan

Five testcontainer integration tests in src/sync/pg-connector.test.ts, each guarding a specific incorrect implementation shape. Every test asserts the exact expected sum (toBe(...)) — reltuples after ANALYZE on uniformly-inserted rows is exact at these sizes (all tables below the sampling threshold).

  • single schema, multi-table — direct repro of the Slack bug. Verified to fail on pre-fix code with expected 77 to be 300077.

  • uneven table counts across schemas — 3 tables (1K/2K/4K) across 2 schemas, where the pre-fix dedup shrinks schemaNames below tableNames and NULL-pads the tail. Verified to fail on pre-fix code with expected 1000 to be 7000.

  • request-set scoping — a 500K unwanted table in the same schema must not leak into the sum.

  • composite pair vs. cross-product — guards against a cartesian re-implementation (n.nspname = ANY($1) AND c.relname = ANY($2)). With public.{x,y} and reporting.{x,y} all present but only (public.x, reporting.y) requested, a cartesian impl would wrongly count public.y and reporting.x. Verified: cartesian variant yields 130300 instead of 300.

  • duplicate input — passing the same (schema, table) pair twice must not double-count. Guards against a JOIN unnest(...) re-implementation that multiplies each catalog row by the number of matching input rows. Verified: JOIN variant yields 10000 instead of 5000.

  • npx tsc --noEmit clean.

  • All 7 tests pass in src/sync/pg-connector.test.ts.

Out of scope (separate finding)

PgIdentifier.toString() applies quote_ident semantics, so a mixed-case table name like "Foo" produces the quoted form "Foo" with double quotes. pg_class.relname stores the raw unquoted name (Foo), so getTotalRowCount will miss any mixed-case or reserved-word table. This is pre-existing behavior, not a regression from this fix — worth a separate ticket if it matters in practice.

🤖 Generated with Claude Code

The previous query deduplicated schemas and tables into separate arrays
and then paired them via unnest($1, $2). When one array was shorter
(common case: one schema, many tables), Postgres padded the shorter
array with NULL, dropping every table past the first from the JOIN.
This silently returned the reltuples of a single table (often a small
audit/config one), causing decideStatsStrategy to fall through to the
10k-row default for sources that should have used real source stats.

Fix: match composite (nspname, relname) pairs from two position-aligned
(non-deduplicated) arrays, so every requested table contributes to the
sum.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query Doctor Analysis

View full run details

2 queries analyzed

2 pre-existing issues

Using assumed statistics (10000000 rows/table). For better results, sync production stats.

veksen and others added 3 commits April 21, 2026 20:38
The previous multi-schema test paired one schema to one table, which
the pre-fix dedup happened to get right (both arrays length 2 → aligned
unnest). Replaces it with an uneven shape — two tables in public plus
one in reporting — where the dedup shrinks the schema array below the
table array, triggering the NULL-padding bug. Also drops the empty-list
test, which only exercised the length==0 guard clause.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two more getTotalRowCount tests beyond the dedup-pairing bug:

1. Composite-pair match vs. cross-product: with public.{x,y} and
   reporting.{x,y} all present and only (public.x, reporting.y)
   requested, a cartesian implementation (n.nspname = ANY($1) AND
   c.relname = ANY($2)) would wrongly count public.y and reporting.x.
   Verified to catch this shape — the cartesian variant yields 130300
   instead of the expected 300.

2. Duplicate input: passing the same (schema, table) pair twice must
   not double-count. A JOIN-based fix against an unnest'd set would
   multiply each catalog row by the number of matching input pairs.
   Verified to catch this shape — JOIN yields 10000 instead of 5000.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
reltuples after ANALYZE on uniformly-inserted rows is exact when the
table is below the sampling threshold (~30K pages), which covers every
test table here. Exact sums are self-documenting (they spell out the
expected arithmetic) and catch additional regressions — e.g. a wrong
COALESCE default, miscounted column, or reltuples-vs-relpages mixup —
that the looser `>=` / `<` bounds silently tolerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@veksen veksen requested a review from Xetera April 21, 2026 17:06
@veksen veksen merged commit 686a6b5 into main Apr 21, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants