fix: pair (schema, table) as composite in getTotalRowCount#126
Merged
Conversation
The previous query deduplicated schemas and tables into separate arrays and then paired them via unnest($1, $2). When one array was shorter (common case: one schema, many tables), Postgres padded the shorter array with NULL, dropping every table past the first from the JOIN. This silently returned the reltuples of a single table (often a small audit/config one), causing decideStatsStrategy to fall through to the 10k-row default for sources that should have used real source stats. Fix: match composite (nspname, relname) pairs from two position-aligned (non-deduplicated) arrays, so every requested table contributes to the sum. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Query Doctor Analysis
2 queries analyzed
2 pre-existing issues
SELECT "guests"."id", "guests"."session_id", "guests"."username", "guests"."avatar_path", "guests"."color", "guests"."side", "guests"."audio_recording_path", "guests"."audio_recording_public", "gue...
indexassets(event_id, inserted_at desc)
cost 31,003,449 → 1,493 (100% reduction)SELECT * FROM guest_ip_addresses WHERE ip_address = '127.0.0.1';
indexguest_ip_addresses(ip_address)
cost 154,402 → 8 (100% reduction)
Using assumed statistics (10000000 rows/table). For better results, sync production stats.
The previous multi-schema test paired one schema to one table, which the pre-fix dedup happened to get right (both arrays length 2 → aligned unnest). Replaces it with an uneven shape — two tables in public plus one in reporting — where the dedup shrinks the schema array below the table array, triggering the NULL-padding bug. Also drops the empty-list test, which only exercised the length==0 guard clause. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two more getTotalRowCount tests beyond the dedup-pairing bug:
1. Composite-pair match vs. cross-product: with public.{x,y} and
reporting.{x,y} all present and only (public.x, reporting.y)
requested, a cartesian implementation (n.nspname = ANY($1) AND
c.relname = ANY($2)) would wrongly count public.y and reporting.x.
Verified to catch this shape — the cartesian variant yields 130300
instead of the expected 300.
2. Duplicate input: passing the same (schema, table) pair twice must
not double-count. A JOIN-based fix against an unnest'd set would
multiply each catalog row by the number of matching input pairs.
Verified to catch this shape — JOIN yields 10000 instead of 5000.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
reltuples after ANALYZE on uniformly-inserted rows is exact when the table is below the sampling threshold (~30K pages), which covers every test table here. Exact sums are self-documenting (they spell out the expected arithmetic) and catch additional regressions — e.g. a wrong COALESCE default, miscounted column, or reltuples-vs-relpages mixup — that the looser `>=` / `<` bounds silently tolerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Xetera
approved these changes
Apr 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
getTotalRowCountdeduplicated schemas and tables into separate arrays and paired them positionally viaunnest($1, $2). For the common case of one schema with N tables, Postgres padded the shorter schema array with NULL, dropping every table past the first from the JOIN — soSUM(reltuples)came from a single table.decideStatsStrategythrough the 10k-row default instead ofdumpSourceStatsand silently degrading optimization for the whole source.(nspname, relname)pairs from position-aligned non-deduplicated arrays via= ANY(SELECT s, t FROM unnest(...)), so every requested table contributes exactly once.Test plan
Five testcontainer integration tests in
src/sync/pg-connector.test.ts, each guarding a specific incorrect implementation shape. Every test asserts the exact expected sum (toBe(...)) —reltuplesafterANALYZEon uniformly-inserted rows is exact at these sizes (all tables below the sampling threshold).single schema, multi-table — direct repro of the Slack bug. Verified to fail on pre-fix code with
expected 77 to be 300077.uneven table counts across schemas — 3 tables (1K/2K/4K) across 2 schemas, where the pre-fix dedup shrinks
schemaNamesbelowtableNamesand NULL-pads the tail. Verified to fail on pre-fix code withexpected 1000 to be 7000.request-set scoping — a 500K
unwantedtable in the same schema must not leak into the sum.composite pair vs. cross-product — guards against a cartesian re-implementation (
n.nspname = ANY($1) AND c.relname = ANY($2)). Withpublic.{x,y}andreporting.{x,y}all present but only(public.x, reporting.y)requested, a cartesian impl would wrongly countpublic.yandreporting.x. Verified: cartesian variant yields 130300 instead of 300.duplicate input — passing the same
(schema, table)pair twice must not double-count. Guards against aJOIN unnest(...)re-implementation that multiplies each catalog row by the number of matching input rows. Verified: JOIN variant yields 10000 instead of 5000.npx tsc --noEmitclean.All 7 tests pass in
src/sync/pg-connector.test.ts.Out of scope (separate finding)
PgIdentifier.toString()appliesquote_identsemantics, so a mixed-case table name like"Foo"produces the quoted form"Foo"with double quotes.pg_class.relnamestores the raw unquoted name (Foo), sogetTotalRowCountwill miss any mixed-case or reserved-word table. This is pre-existing behavior, not a regression from this fix — worth a separate ticket if it matters in practice.🤖 Generated with Claude Code