Rule proposal: bot-cloaking-annotate — detect/annotate AI-targeted cloaking

### Category
New defense rule (speculative / research-direction)

### What problem does this solve?
Every other rule in the project assumes the page the agent reads is the same page a human reviewer would read. *AI-targeted cloaking* breaks that assumption: a site identifies an incoming request as agent traffic (UA, IP range, automation framework signatures) and serves a different, attacker-controlled version (with embedded injection or fabricated "facts") while humans get a benign page.

Recent work establishes this as an active threat against agentic browsers:

- Caspi & Tugendhaft (2025), *A Whole New World: Creating a Parallel-Poisoned Web Only AI-Agents Can See.* https://arxiv.org/abs/2509.00124
- SPLX.ai (Oct 2025), *OpenAI Atlas browser falls for AI-targeted Cloaking Attack.* https://splx.ai/blog/ai-targeted-cloaking-openai-atlas — published PoC against a shipping agentic browser.
- The Hacker News writeup (Oct 2025). https://thehackernews.com/2025/10/new-ai-targeted-cloaking-attack-tricks.html — frames it as a credibility-poisoning vector for AI-search citations.
- *LLM-Driven Adaptive Crawling to Unveil Cloaked Phishing* (arxiv:2508.02035) — PhishParrot, defense-side detection by fingerprint adaptation.

Search-engine cloaking has a long lineage (Wu & Davison, AIRWeb 2005; Chellapilla & Maykov, AIRWeb 2007); AI-targeted cloaking is the same primitive aimed at a new class of crawler.

### Proposed solution
This is structurally hard from a content script alone — the script can't trivially make a "what would Chrome see?" comparison request. Three escalating options:

1. **Fingerprint surface flag.** If the page reads bot-flag globals (`navigator.webdriver`, common automation telltales, agent UA substrings), annotate that the operator is *capable* of distinguishing agents. Doesn't prove cloaking happened; flags capability.
2. **Out-of-band comparison fetch.** Fire a `fetch(location.href, { headers: { 'User-Agent': '<chrome UA>' } })` from the content script (same-origin), diff against `document.documentElement.outerHTML`, annotate when text-content divergence exceeds a threshold.
3. **Background-script proxy fetch with normalized headers.** Move (2) to the MV3 background using `declarativeNetRequest` / `webRequest` to detect content-type and length divergence. Heavier infra, avoids same-origin awkwardness.

Note re repo convention "defenses against prompt injection / cross-origin trickery should strip the content": cloaking is one where content-script-side stripping is structurally infeasible — by the time the rule sees the DOM, the cloaked content *is* the page. Annotation is the best a content script can offer; full mitigation belongs upstream (request layer, vendor, or the agent itself).

### Alternatives considered
- **Defer entirely to the agent's anti-cloaking layer** (model vendor, framework). Reasonable — the threat is real but the in-DOM defense surface is narrow.
- **Browser-side header normalization.** Strip distinguishing headers/UA so the agent looks like Chrome. Out of scope for an extension that targets defensive content rewriting — arms-races with JS-side fingerprinting.

### Controlling false positives
This is the highest-FP-risk rule in the proposal set. Almost every defense-side signal has a benign explanation, so the rule must be conservative.

- **Precise annotation phrasing.** Distinguish between *"this site can distinguish agent traffic"* (option 1 — observable, low confidence) and *"this site served different content under agent fingerprint vs. browser fingerprint"* (option 2/3 — measurable, higher confidence). Never use the unqualified word "cloaking" in the annotation; the rule names a *capability* or *measured divergence*, not an intent.
- **High divergence threshold for options 2/3.** Normal A/B tests and per-request personalization commonly produce 5–15% text diff. The threshold for an annotation should be well above the noise floor — propose >40% Jaccard distance on stemmed token sets, or significant divergence in the prompt-injection pattern set hit count (i.e., the agent-flavored response has injection-shaped strings the Chrome-flavored response doesn't).
- **Geofencing exclusion.** Pages whose comparison fetch crosses a region boundary (different `Set-Cookie` region, different currency, different language detection) are expected to diverge. Detect via response headers and skip.
- **Legitimate bot-detection allowlist.** Sites that fingerprint for anti-fraud (banks, payment processors, e-commerce checkout flows) reasonably distinguish agent traffic. Whitelist these origins from the option-1 capability flag — annotating "this bank's login page can detect agents" is true but unhelpful.
- **CSP / network-level failure ≠ cloaking.** Option 2 in particular: if the comparison fetch returns 403, 451, or a captcha challenge, that's a security control, not cloaking. Treat as "indeterminate" not "cloaked".
- **Don't fire on cross-origin frames.** The parent page can't determine what a frame's origin would serve to a different client; out of scope.
- **Default-off, experimental.** Same posture as `schema-trust-sanitize`, `cross-origin-frame-redact`, `trust-badge-annotate` — ship as experimental candidate until per-host telemetry shows the FP rate is manageable.
- **Per-host allow/deny lists from the start.** Curate (similar to `roach-motel-annotate`'s site list) for hosts where the rule has known signal vs. known noise.

### Open questions / risks
- **Probably out of scope for a pure content-script rule.** v1 is at most option (1); (2) and (3) require background-script work and may not pay rent.
- **Defense vs. offense asymmetry.** Detection from inside the agent's own browsing context is structurally weak — the attacker can flag the comparator request too.

Tagged Impact H / Complexity H.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rule proposal: bot-cloaking-annotate — detect/annotate AI-targeted cloaking #122

Category

What problem does this solve?

Proposed solution

Alternatives considered

Controlling false positives

Open questions / risks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Rule proposal: bot-cloaking-annotate — detect/annotate AI-targeted cloaking #122

Description

Category

What problem does this solve?

Proposed solution

Alternatives considered

Controlling false positives

Open questions / risks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions