Fix: extend encoded-payload-redact with text-cipher encodings (#203)#216
Conversation
Add detection for ROT13, Atbash, reverse, leetspeak, NATO phonetic, and Morse alongside the existing base64 / hex / percent matchers. Text-cipher decodes are gated by a distinct common-English-word count since the encoded form is itself printable; substitution ciphers additionally skip candidates whose source is already English. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
3 issues found.
About Unblocked
Unblocked has been set up to automatically review your team's pull requests to identify genuine bugs and issues.
📖 Documentation — Learn more in our docs.
💬 Ask questions — Mention @unblocked to request a review or summary, or ask follow-up questions.
👍 Give feedback — React to comments with 👍 or 👎 to help us improve.
⚙️ Customize — Adjust settings in your preferences.
| function collectReverse(text: string, matches: InlineMatch[]): void { | ||
| for (const m of text.matchAll(TEXT_CIPHER_CANDIDATE)) { | ||
| const candidate = m[0]; | ||
| if (alreadyEnglish(candidate)) { | ||
| continue; | ||
| } | ||
| if (tryCipherDecode(candidate, reverseText) !== null) { | ||
| matches.push({ | ||
| start: m.index, | ||
| end: m.index + candidate.length, | ||
| label: "[encoded payload hidden]", | ||
| }); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
collectReverse is structurally identical to calling collectSubstitutionCipher(text, reverseText, matches). If collectSubstitutionCipher is later modified (e.g., an extra guard is added), this copy won't be updated in lock-step.
The call site at line 808 can simply become:
collectSubstitutionCipher(text, reverseText, matches);and this function can be deleted.
There was a problem hiding this comment.
Fixed in ed4e025 — folded collectReverse into the new collectSubstitutionCiphers (single pass over TEXT_CIPHER_CANDIDATE driving a [rot13, atbash, reverseText] decoder list), and dropped the standalone wrapper. — Claude Code, on behalf of @twschiller
| function deleet(text: string): string { | ||
| return text.replaceAll( | ||
| new RegExp(LEET_SUBSTITUTION_CHAR_CLASS, "g"), | ||
| (c) => LEET_MAP[c] ?? c, | ||
| ); | ||
| } |
There was a problem hiding this comment.
deleet (and countLeetSubstitutions at line 487) construct a fresh RegExp from the static LEET_SUBSTITUTION_CHAR_CLASS string on every invocation. These sit inside the per-candidate inner loop of collectLeet, so on a page with many 40+-char text groups each call allocates two throwaway regex objects.
Pre-compile once at module level:
const LEET_SUBSTITUTION_RE = new RegExp(LEET_SUBSTITUTION_CHAR_CLASS, "g");Then use LEET_SUBSTITUTION_RE in both helpers (reset lastIndex or use matchAll / spread to avoid stale-state issues with the shared g-flag regex).
There was a problem hiding this comment.
Fixed in ed4e025 — pre-compiled LEET_SUBSTITUTION_RE at module level and routed both deleet (via replaceAll) and countLeetSubstitutions (via match) through it. Both call sites use the shared instance safely — String.prototype.replaceAll and String.prototype.match with a g-flag regex manage lastIndex internally, so no manual reset is needed. — Claude Code, on behalf of @twschiller
| collectSubstitutionCipher(text, rot13, matches); | ||
| collectSubstitutionCipher(text, atbash, matches); | ||
| collectReverse(text, matches); | ||
| collectLeet(text, matches); | ||
| collectNato(text, matches); | ||
| collectMorse(text, matches); |
There was a problem hiding this comment.
collectSubstitutionCipher is called for rot13 and atbash, and collectReverse runs the same TEXT_CIPHER_CANDIDATE regex a third time. Each call also re-runs alreadyEnglish (itself a regex + set lookup) on the same set of candidates.
Consider collecting TEXT_CIPHER_CANDIDATE matches once and testing all three decoders inside a single loop. This would cut the regex work and the alreadyEnglish checks to one-third on every text group, which matters now that MIN_TEXT_LENGTH is 20 (previously 120) and far more groups enter collectMatches.
There was a problem hiding this comment.
Fixed in ed4e025 — collectSubstitutionCiphers now walks TEXT_CIPHER_CANDIDATE once, runs alreadyEnglish once per candidate, and tries each decoder in SUBSTITUTION_DECODERS ([rot13, atbash, reverseText]) until one succeeds. Cuts the regex + qualifier work to one third per text group, which matters now that MIN_TEXT_LENGTH is 20. — Claude Code, on behalf of @twschiller
…edact Adds 13 example tests and 5 property tests for the text-cipher detection paths. Source files contain only ciphertext or symbolic runs — benign English filler is encoded at test time so adversarial phrasing never appears in plaintext. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t regex Address review feedback on #216: - Collapse rot13/atbash/reverse passes into one TEXT_CIPHER_CANDIDATE iteration with a decoder list; cuts regex + alreadyEnglish work to one-third per text group. - Pre-compile LEET_SUBSTITUTION_RE at module level so deleet and countLeetSubstitutions stop allocating a regex per call inside the inner candidate loop. - Drop the now-duplicate collectReverse wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
encoded-payload-redactbeyond byte encodings (base64 / hex / percent) to cover six text ciphers: ROT13, Atbash, reverse, leetspeak, NATO phonetic, and Morse.ABCDE…) are intentionally left alone — instructional content, not a payload.Part of #203.
Test plan
bun run checkinextension/bun run testinextension/—encoded-payload-redact.test.ts(31 cases) andencoded-payload-redact.property.test.ts(12 cases) all pass🤖 Generated with Claude Code