Skip to content

Fix: extend encoded-payload-redact with text-cipher encodings (#203)#216

Merged
twschiller merged 4 commits into
mainfrom
fix/encoded-payload-extra-encodings
Jun 7, 2026
Merged

Fix: extend encoded-payload-redact with text-cipher encodings (#203)#216
twschiller merged 4 commits into
mainfrom
fix/encoded-payload-extra-encodings

Conversation

@twschiller
Copy link
Copy Markdown
Contributor

@twschiller twschiller commented Jun 7, 2026

Summary

  • Extends encoded-payload-redact beyond byte encodings (base64 / hex / percent) to cover six text ciphers: ROT13, Atbash, reverse, leetspeak, NATO phonetic, and Morse.
  • Text ciphers can't use the printable-ASCII-ratio qualifier (the encoded form is already printable), so detection is gated by a distinct-common-English-word count on the decoded output. Substitution ciphers additionally skip candidates whose source text is already English.
  • NATO runs that decode to a sequential alphabet (ABCDE…) are intentionally left alone — instructional content, not a payload.
  • Adds 13 example tests + 5 property tests covering each new cipher: positive paths, length/substitution-floor guards, alphabet-drill carve-out, ASCII-art Morse, and a no-false-fire property on plain English prose. Test sources contain only ciphertext / symbolic runs — benign filler is encoded at test time so adversarial phrasing never appears in plaintext.

Part of #203.

Test plan

  • bun run check in extension/
  • bun run test in extension/encoded-payload-redact.test.ts (31 cases) and encoded-payload-redact.property.test.ts (12 cases) all pass
  • Manually verify a ROT13 / Morse / NATO snippet on a sample page gets the click-to-reveal placeholder
  • Manually verify ordinary English prose containing digits or scattered Morse-like punctuation is not redacted

🤖 Generated with Claude Code

Add detection for ROT13, Atbash, reverse, leetspeak, NATO phonetic,
and Morse alongside the existing base64 / hex / percent matchers.
Text-cipher decodes are gated by a distinct common-English-word count
since the encoded form is itself printable; substitution ciphers
additionally skip candidates whose source is already English.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agent-browser-shield-demo-site Ready Ready Preview, Comment Jun 7, 2026 9:22pm

Request Review

Copy link
Copy Markdown

@unblocked unblocked Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found.

About Unblocked

Unblocked has been set up to automatically review your team's pull requests to identify genuine bugs and issues.

📖 Documentation — Learn more in our docs.

💬 Ask questions — Mention @unblocked to request a review or summary, or ask follow-up questions.

👍 Give feedback — React to comments with 👍 or 👎 to help us improve.

⚙️ Customize — Adjust settings in your preferences.

Comment on lines +737 to +751
function collectReverse(text: string, matches: InlineMatch[]): void {
for (const m of text.matchAll(TEXT_CIPHER_CANDIDATE)) {
const candidate = m[0];
if (alreadyEnglish(candidate)) {
continue;
}
if (tryCipherDecode(candidate, reverseText) !== null) {
matches.push({
start: m.index,
end: m.index + candidate.length,
label: "[encoded payload hidden]",
});
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectReverse is structurally identical to calling collectSubstitutionCipher(text, reverseText, matches). If collectSubstitutionCipher is later modified (e.g., an extra guard is added), this copy won't be updated in lock-step.

The call site at line 808 can simply become:

collectSubstitutionCipher(text, reverseText, matches);

and this function can be deleted.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ed4e025 — folded collectReverse into the new collectSubstitutionCiphers (single pass over TEXT_CIPHER_CANDIDATE driving a [rot13, atbash, reverseText] decoder list), and dropped the standalone wrapper. — Claude Code, on behalf of @twschiller

Comment on lines +384 to +389
function deleet(text: string): string {
return text.replaceAll(
new RegExp(LEET_SUBSTITUTION_CHAR_CLASS, "g"),
(c) => LEET_MAP[c] ?? c,
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleet (and countLeetSubstitutions at line 487) construct a fresh RegExp from the static LEET_SUBSTITUTION_CHAR_CLASS string on every invocation. These sit inside the per-candidate inner loop of collectLeet, so on a page with many 40+-char text groups each call allocates two throwaway regex objects.

Pre-compile once at module level:

const LEET_SUBSTITUTION_RE = new RegExp(LEET_SUBSTITUTION_CHAR_CLASS, "g");

Then use LEET_SUBSTITUTION_RE in both helpers (reset lastIndex or use matchAll / spread to avoid stale-state issues with the shared g-flag regex).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ed4e025 — pre-compiled LEET_SUBSTITUTION_RE at module level and routed both deleet (via replaceAll) and countLeetSubstitutions (via match) through it. Both call sites use the shared instance safely — String.prototype.replaceAll and String.prototype.match with a g-flag regex manage lastIndex internally, so no manual reset is needed. — Claude Code, on behalf of @twschiller

Comment on lines +806 to +811
collectSubstitutionCipher(text, rot13, matches);
collectSubstitutionCipher(text, atbash, matches);
collectReverse(text, matches);
collectLeet(text, matches);
collectNato(text, matches);
collectMorse(text, matches);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectSubstitutionCipher is called for rot13 and atbash, and collectReverse runs the same TEXT_CIPHER_CANDIDATE regex a third time. Each call also re-runs alreadyEnglish (itself a regex + set lookup) on the same set of candidates.

Consider collecting TEXT_CIPHER_CANDIDATE matches once and testing all three decoders inside a single loop. This would cut the regex work and the alreadyEnglish checks to one-third on every text group, which matters now that MIN_TEXT_LENGTH is 20 (previously 120) and far more groups enter collectMatches.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ed4e025collectSubstitutionCiphers now walks TEXT_CIPHER_CANDIDATE once, runs alreadyEnglish once per candidate, and tries each decoder in SUBSTITUTION_DECODERS ([rot13, atbash, reverseText]) until one succeeds. Cuts the regex + qualifier work to one third per text group, which matters now that MIN_TEXT_LENGTH is 20. — Claude Code, on behalf of @twschiller

…edact

Adds 13 example tests and 5 property tests for the text-cipher
detection paths. Source files contain only ciphertext or symbolic
runs — benign English filler is encoded at test time so adversarial
phrasing never appears in plaintext.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t regex

Address review feedback on #216:

- Collapse rot13/atbash/reverse passes into one TEXT_CIPHER_CANDIDATE
  iteration with a decoder list; cuts regex + alreadyEnglish work to
  one-third per text group.
- Pre-compile LEET_SUBSTITUTION_RE at module level so deleet and
  countLeetSubstitutions stop allocating a regex per call inside the
  inner candidate loop.
- Drop the now-duplicate collectReverse wrapper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@twschiller twschiller merged commit 8603b73 into main Jun 7, 2026
7 checks passed
@twschiller twschiller deleted the fix/encoded-payload-extra-encodings branch June 7, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant