Skip to content

Fix: detect PII / encoded payloads split across sibling text nodes (#203)#210

Merged
twschiller merged 2 commits into
mainfrom
fix/cross-node-inline-redact
Jun 7, 2026
Merged

Fix: detect PII / encoded payloads split across sibling text nodes (#203)#210
twschiller merged 2 commits into
mainfrom
fix/cross-node-inline-redact

Conversation

@twschiller
Copy link
Copy Markdown
Contributor

Summary

Approach

  • collectTextNodesWithInlineGroups (new) tags each text node with an inline-formatting-context id. Group id bumps on entering/leaving a block-level element, on <br>, and on shadow-root boundaries. The existing collectTextNodesShadowPiercing becomes a thin wrapper that drops the group tag — no callers of the non-group walker change.
  • walkTextNodeGroupsChunked (new sibling of walkTextNodesChunked) threads the group ids through the same chunked-yield driver.
  • defineInlineTextRedactRule switches to the new walker. Each chunk is bucketed by group; single-node buckets keep the existing single-node fast path; multi-node buckets concatenate values, run collectMatches against the concatenation, and route each match to either replaceMatchesInTextNode (within-node) or replaceMatchAcrossTextNodes (new).
  • replaceMatchAcrossTextNodes scrubs the carrier — first node keeps its prefix, last node keeps its suffix, interior nodes have their nodeValue blanked. Wrapping inline elements stay in the DOM (per the "scrub, don't detach framework-owned nodes" rule). Reveal restores the matched characters as a single text node at the placeholder's position.
  • Per-node minLength no longer filters at the walker level (it would drop legitimate cross-node candidates whose individual nodes sit below the floor); the group's concatenated length is checked instead, preserving the cheap early-out for prose nodes.

Demo site

  • New "3a. Alternate payment" block on Checkout.tsx exercises five span-split shapes (card, SSN, phone, JWT, base64 payload) so the before/after diff visualizes the closed gap.

Test plan

  • bun run check (extension)
  • bun run typecheck (extension)
  • bun run test (extension — 1716 tests, all green; 16 new cross-node tests across pii / secrets / encoded-payload / placeholder helper + 4 new property tests)
  • bun run check (demo-site)
  • pre-commit run --all-files
  • Spot-check demo site in browser: card / SSN / phone / JWT / encoded payload on the new section 3a should each render as […hidden] chips, and reveal-on-click should restore the original characters.

Notes / accepted trade-offs

  • Reveal collapses the matched-text split into one text node — the original per-span breakdown isn't reconstructed. Visual characters are the same; wrapper elements stay.
  • Cross-chunk-boundary cross-node matches are not detected (>100 text nodes in a single inline-formatting context). Vanishingly rare with the default chunkSize=100.
  • Late-inserted siblings that complete an existing cross-node match aren't picked up by the incremental watcher (its scan root is the new subtree, not the joined context). Same limitation as the pre-existing per-node case missing late additions to existing text nodes.

🤖 Generated with Claude Code

)

The inline-text-redact factory (pii-redact / secrets-redact /
encoded-payload-redact) used to scan one text node at a time, so a card
number rendered as `<span>4111</span> <span>1111</span> ...` slipped
past every regex — the bypass tracked as Critical-3 in the #203 audit.

Group text nodes by inline-formatting context during the walk (block
elements, `<br>`, and shadow roots start a new group) and run
collectMatches against the concatenated group value. Matches that
straddle a node boundary land via a new `replaceMatchAcrossTextNodes`
helper that scrubs the carrier text but leaves wrapping inline elements
in place — same "scrub, don't detach framework-owned nodes" pattern the
rest of the codebase already follows.

Closes #203 item 3 (PII / Critical) and item 7 (encoded-payload
split-across-whitespace / High) with one shared change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agent-browser-shield-demo-site Ready Ready Preview, Comment Jun 7, 2026 6:33pm

Request Review

Copy link
Copy Markdown

@unblocked unblocked Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found.

About Unblocked

Unblocked has been set up to automatically review your team's pull requests to identify genuine bugs and issues.

📖 Documentation — Learn more in our docs.

💬 Ask questions — Mention @unblocked to request a review or summary, or ask follow-up questions.

👍 Give feedback — React to comments with 👍 or 👎 to help us improve.

⚙️ Customize — Adjust settings in your preferences.

Comment thread extension/src/lib/inline-text-redact.ts Outdated
Comment on lines +216 to +230
for (const match of matches.toReversed()) {
const start = locateStart(layout, match.start);
const end = locateEnd(layout, match.end);
if (start.index === end.index) {
const node = layout.nodes[start.index];
if (!node) {
continue;
}
replaceMatchesInTextNode(
node,
[{ start: start.offset, end: end.offset, label: match.label }],
ruleId,
);
continue;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When two or more non-overlapping matches both resolve to the same start.index === end.index (i.e., both sit entirely within the same text node in a multi-node bucket), replaceMatchesInTextNode is called once per match with a single-element array. The first call (rightmost, due to toReversed()) detaches the original text node from the DOM via parentNode.replaceChild(fragment, textNode). Every subsequent call for the same node finds textNode.parentNode === null, so the ?.replaceChild is a no-op and the match is silently dropped — leaking the sensitive data it was supposed to redact.

Example: <p><span>SSN: 123-45-6789 Card: 4111 1111 1111 1111</span><span> end</span></p> — the card match is processed first and detaches node 0; the SSN match then fails silently.

Fix: group all single-node matches by their node index and pass each group as a batch to replaceMatchesInTextNode, which already handles multiple matches correctly in a single call. Something like:

const singleNodeMatches = new Map<number, InlineMatch[]>();
const crossNodeMatches: typeof matches = [];

for (const match of matches) {
  const start = locateStart(layout, match.start);
  const end = locateEnd(layout, match.end);
  if (start.index === end.index) {
    let arr = singleNodeMatches.get(start.index);
    if (!arr) { arr = []; singleNodeMatches.set(start.index, arr); }
    arr.push({ start: start.offset, end: end.offset, label: match.label });
  } else {
    crossNodeMatches.push(match);
  }
}

// Apply cross-node matches last-to-first (existing logic)
for (const match of crossNodeMatches.toReversed()) { ... }

// Apply batched single-node matches (one replaceMatchesInTextNode call per node)
for (const [index, nodeMatches] of singleNodeMatches) {
  const node = layout.nodes[index];
  if (node) {
    replaceMatchesInTextNode(node, nodeMatches, ruleId);
  }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 201f44c — confirmed the bug reproduces (SSN + card in one span of a two-span bucket: only the rightward match landed; the SSN leaked) and that the same root cause covered a related shape (cross-node phone + single-node card sharing a node, where the phone's lastNode truncation silently no-op'd because card had already detached it, leaving 4567 visible).

The sequential-application model is fundamentally racy here — any operation that detaches a text node breaks every later operation targeting it, and any operation that truncates a boundary node shifts every later offset in that node. Rather than try to order around it, I batched: new replaceMatchesAcrossTextNodes (plural) snapshots every node's original nodeValue up front, plans each affected node's content from that snapshot, and applies one replaceChild per node. Each text node is mutated exactly once, so no offsets ever go stale.

Regression tests cover both shapes (your example + the cross-node variant). Full suite is green (1718 tests).

— Claude Code, on behalf of @twschiller

…tion

unblocked[bot] flagged that sequential per-match application in
processBucket detached text nodes mid-loop, so a second single-node
match in the same node silently dropped. Same root cause covers a
related case where a cross-node match shares its lastNode with a
single-node match — sequential application either dropped the match
or shifted offsets so the wrong characters were redacted.

Replace the per-match loop with a plural `replaceMatchesAcrossTextNodes`
that snapshots every original `nodeValue` up front, plans each affected
node's new content from that snapshot, then applies one replaceChild
(or value-blank for fully-consumed interior nodes) per node. Each text
node is mutated exactly once, so no offsets ever go stale.

Regression tests cover both shapes:
- two single-node matches sharing a text node in a multi-node bucket
- cross-node match + single-node match sharing a node

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@twschiller twschiller merged commit 0da74ef into main Jun 7, 2026
7 checks passed
@twschiller twschiller deleted the fix/cross-node-inline-redact branch June 7, 2026 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audit: rule-bypass findings (red-team pass)

1 participant