Skip to content

Rule proposal: bot-cloaking-annotate — detect/annotate AI-targeted cloaking #122

@twschiller

Description

@twschiller

Category

New defense rule (speculative / research-direction)

What problem does this solve?

Every other rule in the project assumes the page the agent reads is the same page a human reviewer would read. AI-targeted cloaking breaks that assumption: a site identifies an incoming request as agent traffic (UA, IP range, automation framework signatures) and serves a different, attacker-controlled version (with embedded injection or fabricated "facts") while humans get a benign page.

Recent work establishes this as an active threat against agentic browsers:

Search-engine cloaking has a long lineage (Wu & Davison, AIRWeb 2005; Chellapilla & Maykov, AIRWeb 2007); AI-targeted cloaking is the same primitive aimed at a new class of crawler.

Proposed solution

This is structurally hard from a content script alone — the script can't trivially make a "what would Chrome see?" comparison request. Three escalating options:

  1. Fingerprint surface flag. If the page reads bot-flag globals (navigator.webdriver, common automation telltales, agent UA substrings), annotate that the operator is capable of distinguishing agents. Doesn't prove cloaking happened; flags capability.
  2. Out-of-band comparison fetch. Fire a fetch(location.href, { headers: { 'User-Agent': '<chrome UA>' } }) from the content script (same-origin), diff against document.documentElement.outerHTML, annotate when text-content divergence exceeds a threshold.
  3. Background-script proxy fetch with normalized headers. Move (2) to the MV3 background using declarativeNetRequest / webRequest to detect content-type and length divergence. Heavier infra, avoids same-origin awkwardness.

Note re repo convention "defenses against prompt injection / cross-origin trickery should strip the content": cloaking is one where content-script-side stripping is structurally infeasible — by the time the rule sees the DOM, the cloaked content is the page. Annotation is the best a content script can offer; full mitigation belongs upstream (request layer, vendor, or the agent itself).

Alternatives considered

  • Defer entirely to the agent's anti-cloaking layer (model vendor, framework). Reasonable — the threat is real but the in-DOM defense surface is narrow.
  • Browser-side header normalization. Strip distinguishing headers/UA so the agent looks like Chrome. Out of scope for an extension that targets defensive content rewriting — arms-races with JS-side fingerprinting.

Controlling false positives

This is the highest-FP-risk rule in the proposal set. Almost every defense-side signal has a benign explanation, so the rule must be conservative.

  • Precise annotation phrasing. Distinguish between "this site can distinguish agent traffic" (option 1 — observable, low confidence) and "this site served different content under agent fingerprint vs. browser fingerprint" (option 2/3 — measurable, higher confidence). Never use the unqualified word "cloaking" in the annotation; the rule names a capability or measured divergence, not an intent.
  • High divergence threshold for options 2/3. Normal A/B tests and per-request personalization commonly produce 5–15% text diff. The threshold for an annotation should be well above the noise floor — propose >40% Jaccard distance on stemmed token sets, or significant divergence in the prompt-injection pattern set hit count (i.e., the agent-flavored response has injection-shaped strings the Chrome-flavored response doesn't).
  • Geofencing exclusion. Pages whose comparison fetch crosses a region boundary (different Set-Cookie region, different currency, different language detection) are expected to diverge. Detect via response headers and skip.
  • Legitimate bot-detection allowlist. Sites that fingerprint for anti-fraud (banks, payment processors, e-commerce checkout flows) reasonably distinguish agent traffic. Whitelist these origins from the option-1 capability flag — annotating "this bank's login page can detect agents" is true but unhelpful.
  • CSP / network-level failure ≠ cloaking. Option 2 in particular: if the comparison fetch returns 403, 451, or a captcha challenge, that's a security control, not cloaking. Treat as "indeterminate" not "cloaked".
  • Don't fire on cross-origin frames. The parent page can't determine what a frame's origin would serve to a different client; out of scope.
  • Default-off, experimental. Same posture as schema-trust-sanitize, cross-origin-frame-redact, trust-badge-annotate — ship as experimental candidate until per-host telemetry shows the FP rate is manageable.
  • Per-host allow/deny lists from the start. Curate (similar to roach-motel-annotate's site list) for hosts where the rule has known signal vs. known noise.

Open questions / risks

  • Probably out of scope for a pure content-script rule. v1 is at most option (1); (2) and (3) require background-script work and may not pay rent.
  • Defense vs. offense asymmetry. Detection from inside the agent's own browsing context is structurally weak — the attacker can flag the comparator request too.

Tagged Impact H / Complexity H.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestrule-proposalProposed new defense rule, pending triage/citation review

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions