Skip to content

Add ignore param to createDocuments for silent duplicate handling#852

Open
premtsd-code wants to merge 4 commits intomainfrom
csv-import-upsert-v2
Open

Add ignore param to createDocuments for silent duplicate handling#852
premtsd-code wants to merge 4 commits intomainfrom
csv-import-upsert-v2

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

@premtsd-code premtsd-code commented Apr 8, 2026

Summary

  • Adds bool $ignore = false parameter to createDocuments across all adapters for silent duplicate handling
  • Pre-filters duplicates inside the transaction/session in both SQL and MongoDB adapters to prevent race conditions, $modified overcount, and phantom onNext callbacks
  • Deduplicates intra-batch documents by ID (tenant-aware) to prevent ACL drift
  • Optimizes upsertDocuments: batch-fetch existing docs with find instead of per-row getDocument calls (~2x speedup)

Adapter Implementation

Adapter Strategy
MariaDB/MySQL INSERT IGNORE INTO + pre-filter existing UIDs in transaction
PostgreSQL ON CONFLICT DO NOTHING + pre-filter existing UIDs in transaction
SQLite INSERT OR IGNORE INTO + pre-filter existing UIDs in transaction
MongoDB Pre-filter existing UIDs within session, then insertMany with session

Test plan

  • Mixed batch (existing + new docs) with ignore: true — only new docs inserted
  • All-duplicate batch with ignore: true — no rows inserted, count = 0
  • Intra-batch duplicate IDs — first occurrence wins, no ACL drift
  • onNext callback assertions — fires only for actually inserted docs
  • All 39 tests pass across 13 adapter configurations
  • Benchmark: upsertDocuments ~2x faster with batch-fetch optimization

Summary by CodeRabbit

  • New Features

    • Optional "ignore duplicates" mode for document creation that silently skips existing entries and returns only newly inserted counts
    • Automatic intra-batch deduplication (first occurrence wins) to prevent duplicate writes and ACL drift
    • Batch-aware pre-checks to skip inserts when items already exist; mirror/storage layers honor ignore mode
    • Relationship creation deferred to avoid applying relations for skipped documents
  • Tests

    • Added end-to-end tests for mixed, intra-batch, and all-duplicate scenarios with ignore enabled

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an optional bool $ignore = false across Database.createDocuments, adapters, and mirror to skip duplicate inserts: implements intra-batch deduplication, pre-fetch filtering of existing IDs, adapter-specific INSERT/IGNORE/ON CONFLICT hooks, deferred relationship writes for newly inserted docs, and e2e tests for ignore-mode behavior.

Changes

Cohort / File(s) Summary
Core adapter interface
src/Database/Adapter.php
Abstract createDocuments signature extended with bool $ignore = false and docblock updated.
Adapter delegation / pool
src/Database/Adapter/Pool.php
createDocuments(..., bool $ignore = false) signature added; delegation forwards new arg via delegate(__FUNCTION__, \func_get_args()).
Mongo adapter
src/Database/Adapter/Mongo.php
createDocuments(..., bool $ignore=false) now pre-queries existing _uid (tenant-aware) and filters out existing docs before insertMany when $ignore is true; preserves transaction context.
SQL adapters & helpers
src/Database/Adapter/SQL.php, src/Database/Adapter/SQLite.php, src/Database/Adapter/Postgres.php
SQL insert flow updated to accept $ignore; new protected hooks (getInsertKeyword, getInsertSuffix, getInsertPermissionsSuffix) provide adapter-specific duplicate-handling SQL (e.g., INSERT OR IGNORE, ON CONFLICT DO NOTHING); post-insert reconciliation added when needed.
Database orchestration
src/Database/Database.php
createDocuments(..., bool $ignore=false) implements intra-batch dedup (id or tenant:id), chunked pre-fetch of existing IDs, skips relationship creation for pre-existing keys, defers relationship writes for actually-inserted docs, filters chunks before adapter calls, and passes $ignore to adapters. upsertDocumentsWithIncrease() refactored to batch-find existing docs.
Mirroring
src/Database/Mirror.php
createDocuments(..., bool $ignore=false) added and forwarded to source/destination (including preserve-dates path).
Tests
tests/e2e/Adapter/Scopes/DocumentTests.php
Added 3 e2e tests verifying createDocuments(..., ignore: true) for mixed new/duplicate, intra-batch duplicates (first occurrence wins), and all-duplicate batches (zero inserts, no emissions).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Database
    participant Adapter
    participant Storage as DB

    rect rgba(200,100,150,0.5)
    Note over Client,Storage: Existing flow (ignore: false)
    Client->>Database: createDocuments(docs, ignore: false)
    Database->>Database: prepare documents, no pre-filter
    Database->>Adapter: createDocuments(docs, ignore: false)
    Adapter->>DB: INSERT INTO ...
    DB-->>Adapter: ✗ duplicate error (may throw)
    Adapter-->>Database: throws DuplicateException
    Database-->>Client: exception propagated
    end
Loading
sequenceDiagram
    participant Client
    participant Database
    participant Adapter
    participant Storage as DB

    rect rgba(100,180,120,0.5)
    Note over Client,Storage: New flow (ignore: true)
    Client->>Database: createDocuments(docs, ignore: true)
    Database->>Database: intra-batch dedup by id / tenant:id
    Database->>Adapter: find(existing ids) [chunked]
    Adapter->>DB: SELECT _uid WHERE _uid IN (...)
    DB-->>Adapter: existing ids
    Adapter-->>Database: $preExistingIds
    Database->>Database: filter out pre-existing docs, strip relationships
    Database->>Adapter: createDocuments(filtered_docs, ignore: true)
    Adapter->>DB: INSERT IGNORE / ON CONFLICT DO NOTHING
    DB-->>Adapter: ✓ success (duplicates skipped)
    Adapter-->>Database: inserted rows info
    Database->>Database: create relationships for actually inserted docs
    Database-->>Client: count (new inserts only)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • fogelito
  • abnegate
  • ArnabChatterjee20k

Poem

🐇 I hopped through rows and checked each id,
I nudged the duplicates gently to the side,
With ignore set true, only newcomers stay,
The first one keeps the carrot — the rest hop away.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding an ignore parameter to createDocuments for handling duplicate documents silently, which is the core objective of this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch csv-import-upsert-v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 8, 2026

Greptile Summary

This PR adds a bool $ignore = false parameter to createDocuments across all adapters, enabling silent duplicate handling via INSERT IGNORE/ON CONFLICT DO NOTHING (SQL) and a pre-filter find pass inside the session (MongoDB), plus intra-batch deduplication and deferred relationship creation in Database.php. The implementation is well-structured and the new tests cover the core happy paths. Prior review threads raised several open concerns — notably that the MongoDB adapter still does not set ordered: false on insertMany, leaving race-condition slip-throughs able to surface a BulkWriteException rather than silently skipping; and that the SQL rowCount reconciliation is vulnerable to timestamp collisions under concurrent writes. Those threads should be resolved before merge.

Confidence Score: 4/5

Safe to merge for most workloads; prior-thread P1 concerns around MongoDB race conditions and SQL timestamp collision reconciliation should be addressed first

New findings are P2 only, but prior threads raised unresolved P1 concerns: (1) MongoDB insertMany without ordered: false still throws on race-condition duplicates instead of skipping them, and (2) the _createdAt timestamp comparison in the SQL reconciliation path can produce phantom documents when two processes write at the same millisecond

src/Database/Adapter/Mongo.php (missing ordered: false), src/Database/Adapter/SQL.php (timestamp reconciliation), src/Database/Database.php (orphaned-relationship race condition)

Important Files Changed

Filename Overview
src/Database/Database.php Adds intra-batch dedup, pre-fetch of existing IDs, and deferred relationship creation for $ignore mode; race-condition and orphaned-relationship concerns flagged in prior threads are not fully resolved
src/Database/Adapter/SQL.php Adds pre-filter SELECT inside transaction, INSERT IGNORE/ON CONFLICT keywords, and a rowCount reconciliation path; timestamp-collision and phantom-document concerns from prior threads remain open
src/Database/Adapter/Mongo.php Pre-filters duplicates via a find + getMore loop before insertMany; still lacks ordered: false, so a race-condition slip-through throws instead of silently skipping (flagged in prior thread)
src/Database/Adapter/Postgres.php Clean overrides for getInsertKeyword, getInsertSuffix, and getInsertPermissionsSuffix using ON CONFLICT … DO NOTHING with correct conflict targets
src/Database/Adapter/SQLite.php Single override of getInsertKeyword to emit INSERT OR IGNORE INTO; all other behavior inherited from MariaDB/SQL base
src/Database/Mirror.php Passes $ignore to both source and destination createDocuments; destination call intentionally omits $onNext/$onError to avoid double-firing the caller's callback
src/Database/Adapter/Pool.php Updated createDocuments signature delegates transparently via func_get_args(); no issues
src/Database/Adapter.php Abstract method signature updated with bool $ignore = false and matching docblock; straightforward change
tests/e2e/Adapter/Scopes/DocumentTests.php New tests cover mixed-batch, intra-batch duplicate, and all-duplicate scenarios with ignore: true, including onNext callback assertions

Reviews (12): Last reviewed commit: "Revert "Fix Mongo adapter ignore mode: p..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
tests/e2e/Adapter/Scopes/DocumentTests.php (1)

7815-7818: Prefer __FUNCTION__ for collection naming consistency.

Using __FUNCTION__ here (as in adjacent tests) improves isolation and avoids accidental name reuse across reruns.

♻️ Suggested tweak
-        $col = 'createDocsIgnoreIntraBatch';
+        $col = __FUNCTION__;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 7815 - 7818, The
collection name is hard-coded as 'createDocsIgnoreIntraBatch' which can collide
across test runs; change the $col assignment to use the current function name
(replace the literal with __FUNCTION__) so the calls to
$database->createCollection($col) and $database->createAttribute($col, 'name',
Database::VAR_STRING, 128, true) use a unique, test-local collection name
consistent with adjacent tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 1501-1524: The current dedupe prefetch uses only _uid and a single
tenant filter, causing cross-tenant collisions for shared tables; update the
logic that builds existingUids (and the prefetch loop around
$this->client->find/$this->client->getMore) to be tenant-aware by using a
composite key (e.g., "$tenant:$uid") when populating $existingUids, include the
tenant in the $filters returned by $this->getTenantFilters($collection->getId())
per chunk, and prefetch in per-tenant chunks using array_chunk($uidValues,
max(1, $this->maxQueryValues)) (use the chunk size when calling getMore instead
of count($uidValues)); then when filtering $records check the composite key
($record['_tenant'] . ':' . ($record['_uid'] ?? '')) against $existingUids so
duplicate detection respects tenant boundaries.
- Around line 1504-1516: The prefetch block calls $this->client->find() and
$this->client->getMore() without exception mapping; wrap each driver call in a
try/catch that catches \Throwable and rethrows the normalized exception via
processException($e) (same pattern used elsewhere in this class), i.e. surround
the $this->client->find(...) call and the $this->client->getMore(...) call with
try { ... } catch (\Throwable $e) { throw $this->processException($e); } so any
raw driver errors from the $response/$more handling are mapped consistently.

In `@src/Database/Database.php`:
- Around line 7210-7255: The upsertDocumentsWithIncrease() flow builds
$existingDocs keyed as tenant:id when $upsertTenantPerDocument is true, but
later duplicate-input detection still uses only $document->getId(), causing
false DuplicateException across tenants; update the duplicate-input check and
any intermediate maps (e.g., $seenIds/$inputKeys) to use the same composite
lookup key (use the existing $lookupKey logic: $tenant . ':' .
$document->getId() when $upsertTenantPerDocument is true) so the tenant-aware
key is reused consistently across the whole upsert path.

---

Nitpick comments:
In `@tests/e2e/Adapter/Scopes/DocumentTests.php`:
- Around line 7815-7818: The collection name is hard-coded as
'createDocsIgnoreIntraBatch' which can collide across test runs; change the $col
assignment to use the current function name (replace the literal with
__FUNCTION__) so the calls to $database->createCollection($col) and
$database->createAttribute($col, 'name', Database::VAR_STRING, 128, true) use a
unique, test-local collection name consistent with adjacent tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c9f1924e-3ed0-43a8-8859-6e6c299aadb5

📥 Commits

Reviewing files that changed from the base of the PR and between 72ee161 and b9c7f4d.

📒 Files selected for processing (9)
  • src/Database/Adapter.php
  • src/Database/Adapter/Mongo.php
  • src/Database/Adapter/Pool.php
  • src/Database/Adapter/Postgres.php
  • src/Database/Adapter/SQL.php
  • src/Database/Adapter/SQLite.php
  • src/Database/Database.php
  • src/Database/Mirror.php
  • tests/e2e/Adapter/Scopes/DocumentTests.php

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
src/Database/Database.php (2)

5682-5726: ⚠️ Potential issue | 🟠 Major

Don’t let the outer prefetch snapshot suppress writes.

The adapters already do the authoritative duplicate filtering inside the transaction/session, but this code still drops $chunk entries from a pre-transaction find(). If a matching row is deleted after that snapshot but before the insert transaction starts, ignore: true will incorrectly skip a document that should now insert, and $modified / $onNext will be low.

Also applies to: 5805-5815

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5682 - 5726, The current prefetch
logic builds $preExistingIds (when $ignore is true, in the tenantPerDocument and
non-tenant branches using find()/authorization->skip/etc.) and later uses that
snapshot to skip writes, which can wrongly suppress inserts if a matching row
was deleted after the prefetch; remove or stop using this prefetch-based
suppression. Instead, eliminate reliance on $preExistingIds to skip document
inserts/relationship writes (and the related checks that reference it around the
relationship-resolution code and at the later block noted ~5805-5815), and
either rely on the adapters’ authoritative duplicate filtering inside the
transaction/session or re-check existence inside the transaction. Keep the
prefetch only if it’s used for optimizations that do not affect whether an
insert occurs, or remove the prefetch entirely if it’s unused for
non-authoritative optimizations.

5788-5790: ⚠️ Potential issue | 🟠 Major

Use the same tenant-aware key for deferred relationship writes.

In tenant-per-document mode, two inserted documents can legitimately share the same $id. This map is keyed only by $id, so the later payload overwrites the earlier one, and unset() after the first insert prevents the second tenant’s relationship write entirely.

♻️ Proposed fix
                 if (!empty($relationshipData)) {
-                    $deferredRelationships[$document->getId()] = $relationshipData;
+                    $relationshipKey = $tenantPerDocument
+                        ? $document->getTenant() . ':' . $document->getId()
+                        : $document->getId();
+                    $deferredRelationships[$relationshipKey] = $relationshipData;
                 }
@@
                 foreach ($batch as $insertedDoc) {
-                    $docId = $insertedDoc->getId();
-                    if (\array_key_exists($docId, $deferredRelationships)) {
+                    $relationshipKey = $tenantPerDocument
+                        ? $insertedDoc->getTenant() . ':' . $insertedDoc->getId()
+                        : $insertedDoc->getId();
+                    if (\array_key_exists($relationshipKey, $deferredRelationships)) {
                         $relDoc = clone $insertedDoc;
-                        foreach ($deferredRelationships[$docId] as $key => $value) {
+                        foreach ($deferredRelationships[$relationshipKey] as $key => $value) {
                             $relDoc->setAttribute($key, $value);
                         }
                         $this->silent(fn () => $this->createDocumentRelationships($collection, $relDoc));
-                        unset($deferredRelationships[$docId]);
+                        unset($deferredRelationships[$relationshipKey]);
                     }
                 }

Also applies to: 5823-5832

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5788 - 5790, The deferred
relationship map ($deferredRelationships) is keyed only by $document->getId(),
which causes collisions in tenant-per-document mode; change the key to be
tenant-aware (e.g., combine tenant id and document id or use the document's
tenant identifier method) wherever entries are set and removed so each
tenant/document pair is unique—update the code that assigns
$deferredRelationships[$document->getId()] = $relationshipData; and the matching
unset() logic (and the same pattern around the related block at lines referenced
5823-5832) to use the composite tenant-aware key instead of just getId().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2482-2527: The pre-filter SELECT that checks for existing _uid
rows (the block building $sql, preparing via $this->getPDO()->prepare, binding
values and calling $stmt->execute()/fetchAll()) can throw a raw PDOException
because it runs outside the method's main try/catch; wrap the entire
prepare/execute/fetch block in the adapter's processException(...) wrapper so
exceptions are mapped consistently (i.e., call $this->processException(...)
around the closure that prepares, binds, executes and fetches, returning $rows),
keeping the logic inside createDocuments() and referencing the existing $sql,
$binds, $stmt and $rows variables.
- Around line 2665-2713: The reconciliation currently keys timestamps only by
_uid and uses a single :_vfy_tenant bind when $this->sharedTables is true, which
corrupts mixed-tenant batches; update the verification to be tenant-aware by
binding the tenant per document and keying by a composite tenant+uid.
Concretely, in the block that builds $verifyPlaceholders/$verifyBinds and
$verifySql (the code using $verifyPlaceholders, $verifyBinds, $verifySql and
$verifyStmt), include the tenant value for each document (use the document's
tenant when tenant-per-document applies), change the WHERE to match pairs (e.g.
use row-value tuples or explicit (_uid = :uX AND _tenant = :tX) clauses), and
then build $actualTimestamps keyed by "$tenant|$uid" (and compare against
$expectedTimestamps keyed the same way) so the subsequent loop that populates
$insertedDocs/$insertedUids only matches rows with the correct tenant+uid.
- Around line 2650-2652: Replace the use of "INSERT IGNORE" (returned by
getInsertKeyword) with a plain "INSERT" and remove any IGNORE-specific behavior
in getInsertSuffix so that constraint violations are not globally suppressed;
instead let the DB raise errors and update the insert/execute path that catches
SQL exceptions to inspect the database error/constraint name (e.g., index or key
name such as _uid) and map errors to DuplicateException only when the violated
constraint corresponds to the _uid duplicate, otherwise map to UniqueException
(or other specific exceptions) for PRIMARY/UNIQUE/NOT NULL/truncation
violations—use the existing functions getInsertKeyword and getInsertSuffix to
locate changes and adjust the exception-mapping in the code that performs the
insert and error handling.

---

Duplicate comments:
In `@src/Database/Database.php`:
- Around line 5682-5726: The current prefetch logic builds $preExistingIds (when
$ignore is true, in the tenantPerDocument and non-tenant branches using
find()/authorization->skip/etc.) and later uses that snapshot to skip writes,
which can wrongly suppress inserts if a matching row was deleted after the
prefetch; remove or stop using this prefetch-based suppression. Instead,
eliminate reliance on $preExistingIds to skip document inserts/relationship
writes (and the related checks that reference it around the
relationship-resolution code and at the later block noted ~5805-5815), and
either rely on the adapters’ authoritative duplicate filtering inside the
transaction/session or re-check existence inside the transaction. Keep the
prefetch only if it’s used for optimizations that do not affect whether an
insert occurs, or remove the prefetch entirely if it’s unused for
non-authoritative optimizations.
- Around line 5788-5790: The deferred relationship map ($deferredRelationships)
is keyed only by $document->getId(), which causes collisions in
tenant-per-document mode; change the key to be tenant-aware (e.g., combine
tenant id and document id or use the document's tenant identifier method)
wherever entries are set and removed so each tenant/document pair is
unique—update the code that assigns $deferredRelationships[$document->getId()] =
$relationshipData; and the matching unset() logic (and the same pattern around
the related block at lines referenced 5823-5832) to use the composite
tenant-aware key instead of just getId().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d2ab610b-bb64-4dcb-979c-f3880d9e99df

📥 Commits

Reviewing files that changed from the base of the PR and between 20846d3 and 242c9a8.

📒 Files selected for processing (3)
  • src/Database/Adapter/Mongo.php
  • src/Database/Adapter/SQL.php
  • src/Database/Database.php
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Database/Adapter/Mongo.php

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
src/Database/Adapter/SQL.php (3)

2482-2527: ⚠️ Potential issue | 🟠 Major

Wrap the ignore-mode pre-filter in processException() handling.

This query runs before the method’s main try/catch, so SQL or connection failures here still escape as raw PDOException while the rest of createDocuments() is adapter-mapped.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2482 - 2527, The pre-filter block
in createDocuments() that prepares and executes the SELECT (using getSQLTable(),
getPDO(), getPDOType(), preparing $sql and executing $stmt) should be wrapped
inside a call to processException() so any PDOException is adapter-mapped
instead of escaping; move the existing logic that builds $placeholders/$binds,
$tenantFilter, $sql, $stmt->bindValue(), $stmt->execute(), $rows =
$stmt->fetchAll() and $stmt->closeCursor() into a closure passed to
processException(), and return/assign the result from that closure (or rethrow
via processException) so behavior matches the rest of createDocuments().

2782-2785: ⚠️ Potential issue | 🟠 Major

Don’t use INSERT IGNORE as the base implementation for ignore=true.

For adapters that inherit this default, ignore=true stops meaning “skip duplicate document IDs” and starts suppressing a broader class of insert failures on both the document and permission insert paths. The base hook should stay plain INSERT INTO, and adapter-specific duplicate handling should be expressed with targeted conflict clauses or constraint-aware exception mapping.

Does MariaDB/MySQL `INSERT IGNORE` suppress constraint violations beyond duplicate-key conflicts, such as PRIMARY KEY, other UNIQUE, NOT NULL, truncation, or conversion errors?

Based on learnings, in src/Database/Adapter/MariaDB.php only duplicate _uid violations should throw DuplicateException; other unique or PRIMARY collisions should still throw UniqueException.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2782 - 2785, The base method
getInsertKeyword in class SQL should not return 'INSERT IGNORE INTO' for
ignore=true; change it to always return the plain 'INSERT INTO' and move any
ignore semantics into adapter-specific code (e.g., Adapter/MariaDB.php) by using
targeted conflict clauses or by mapping constraint/SQLSTATE errors to the
correct exceptions; specifically, in MariaDB adapter ensure only duplicate
violations on the document _uid map to DuplicateException while other
UNIQUE/PRIMARY/constraint failures map to UniqueException (and do not rely on
INSERT IGNORE in SQL::getInsertKeyword).

2664-2728: ⚠️ Potential issue | 🔴 Critical

Keep expectedTimestamps tenant-aware too.

$actualTimestamps is keyed by tenant:uid now, but $expectedTimestamps is still keyed only by _uid. In tenantPerDocument mode, two documents with the same _uid in different tenants can overwrite each other here, causing an inserted document to be dropped from $documents and miss its permission insert.

💡 Proposed fix
-                $expectedTimestamps = [];
-                foreach ($documents as $doc) {
-                    $expectedTimestamps[$doc->getId()] = $doc->getCreatedAt();
-                }
+                $expectedTimestamps = [];
+                foreach ($documents as $doc) {
+                    $key = ($this->sharedTables && $this->tenantPerDocument)
+                        ? $doc->getTenant() . ':' . $doc->getId()
+                        : $doc->getId();
+                    $expectedTimestamps[$key] = $doc->getCreatedAt();
+                }
@@
-                    if (isset($actualTimestamps[$key]) && $actualTimestamps[$key] === $expectedTimestamps[$doc->getId()]) {
+                    if (isset($actualTimestamps[$key]) && $actualTimestamps[$key] === $expectedTimestamps[$key]) {
                         $insertedDocs[] = $doc;
                     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2664 - 2728, The
expectedTimestamps map is not tenant-aware while actualTimestamps is, so when
sharedTables && tenantPerDocument you must key expectedTimestamps the same way
as actualTimestamps (tenant + ':' + uid) during the initial foreach over
$documents and also use that key when comparing later (instead of
$expectedTimestamps[$doc->getId()]); update the loop that builds
$expectedTimestamps and the comparison that checks equality to construct/look up
keys as ($doc->getTenant() . ':' . $doc->getId()) when $this->sharedTables &&
$this->tenantPerDocument so entries don't collide across tenants.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2482-2527: The pre-filter block in createDocuments() that prepares
and executes the SELECT (using getSQLTable(), getPDO(), getPDOType(), preparing
$sql and executing $stmt) should be wrapped inside a call to processException()
so any PDOException is adapter-mapped instead of escaping; move the existing
logic that builds $placeholders/$binds, $tenantFilter, $sql, $stmt->bindValue(),
$stmt->execute(), $rows = $stmt->fetchAll() and $stmt->closeCursor() into a
closure passed to processException(), and return/assign the result from that
closure (or rethrow via processException) so behavior matches the rest of
createDocuments().
- Around line 2782-2785: The base method getInsertKeyword in class SQL should
not return 'INSERT IGNORE INTO' for ignore=true; change it to always return the
plain 'INSERT INTO' and move any ignore semantics into adapter-specific code
(e.g., Adapter/MariaDB.php) by using targeted conflict clauses or by mapping
constraint/SQLSTATE errors to the correct exceptions; specifically, in MariaDB
adapter ensure only duplicate violations on the document _uid map to
DuplicateException while other UNIQUE/PRIMARY/constraint failures map to
UniqueException (and do not rely on INSERT IGNORE in SQL::getInsertKeyword).
- Around line 2664-2728: The expectedTimestamps map is not tenant-aware while
actualTimestamps is, so when sharedTables && tenantPerDocument you must key
expectedTimestamps the same way as actualTimestamps (tenant + ':' + uid) during
the initial foreach over $documents and also use that key when comparing later
(instead of $expectedTimestamps[$doc->getId()]); update the loop that builds
$expectedTimestamps and the comparison that checks equality to construct/look up
keys as ($doc->getTenant() . ':' . $doc->getId()) when $this->sharedTables &&
$this->tenantPerDocument so entries don't collide across tenants.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 892638bd-ac20-414b-9059-7f7e20295028

📥 Commits

Reviewing files that changed from the base of the PR and between 242c9a8 and 455da70.

📒 Files selected for processing (2)
  • src/Database/Adapter/SQL.php
  • src/Database/Database.php
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Database/Database.php

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2676-2682: The verification query is binding composite keys from
$expectedTimestamps into $verifyBinds/placeholders and then comparing them
against the _uid column, but when tenantPerDocument is enabled _uid was set to
the raw document ID ($attributes['_uid'] = $document->getId()), so the WHERE
_uid IN (...) must use the document IDs not the composite keys. Fix by
extracting the document ID portion from each composite key before creating $ph
and adding to $verifyBinds (or if keys are already plain IDs, leave unchanged);
update the loop that builds $verifyPlaceholders/$verifyBinds (the foreach over
array_values(array_unique(array_keys($expectedTimestamps)))) to derive $uid =
<documentIdFromComposite($uid)> and bind that instead so the verification query
matches rows by actual _uid values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 43af96a2-d479-4354-86c9-7d68abad7b56

📥 Commits

Reviewing files that changed from the base of the PR and between 455da70 and deedfda.

📒 Files selected for processing (1)
  • src/Database/Adapter/SQL.php

Adapters:
- MariaDB/MySQL: INSERT IGNORE INTO
- PostgreSQL: ON CONFLICT DO NOTHING
- SQLite: INSERT OR IGNORE INTO
- MongoDB: session-scoped pre-filter before insertMany

Database.php:
- Intra-batch dedup by ID (tenant-aware, first occurrence wins)
- Pre-fetch existing IDs to skip known duplicates
- Deferred relationship creation for ignore mode (no orphans)
- Race-condition reconciliation via _createdAt timestamp verification

upsertDocuments:
- Batch-fetch existing docs with find() instead of per-row getDocument()
- Tenant-aware composite keys for seenIds duplicate check

All paths are tenant-per-document aware with null-safe array_filter.
- testCreateDocumentsIgnoreDuplicates: mixed batch with onNext assertions
- testCreateDocumentsIgnoreIntraBatchDuplicates: first-wins, no ACL drift
- testCreateDocumentsIgnoreAllDuplicates: zero inserts, empty onNext
@premtsd-code premtsd-code force-pushed the csv-import-upsert-v2 branch from f2b08dc to bee42d4 Compare April 9, 2026 10:28
Comment on lines 2651 to 2661
@@ -2583,13 +2660,111 @@ public function createDocuments(Document $collection, array $documents): array

$this->execute($stmt);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 INSERT IGNORE INTO suppresses all errors, not only duplicates

MySQL/MariaDB's INSERT IGNORE INTO converts every constraint violation — including NOT NULL violations, FK violations, and value truncation — into a silent warning. If a document is dropped for a reason other than a duplicate-key conflict, the adapter returns it in $documents as if it were inserted (no rowCount discrepancy), onNext fires, and $modified is incremented for a row that was never written.

ON CONFLICT DO NOTHING (Postgres) and INSERT OR IGNORE INTO (SQLite) share this characteristic for constraint violations. Consider logging PDOStatement::errorInfo() after the INSERT when $ignore is true so non-duplicate failures are at least observable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StructureValidator will handle the required NOT NULL columns

…x _uid mapping

- Pass ignoreDuplicates option to insertMany for race-condition safety
- Extract _uid from raw array before replaceChars (excluded from transformation)
- Map inserted records back to original Document objects by _uid
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant