Skip to content

Fix Anthropic 400 on structured output: strip unsupported JSON-schema range keywords#52

Open
amsclark wants to merge 1 commit into
Lazarus-AI:mainfrom
amsclark:fix/anthropic-structured-output-schema
Open

Fix Anthropic 400 on structured output: strip unsupported JSON-schema range keywords#52
amsclark wants to merge 1 commit into
Lazarus-AI:mainfrom
amsclark:fix/anthropic-structured-output-schema

Conversation

@amsclark

Copy link
Copy Markdown

Fixes #50.

Problem

Anthropic's structured-output JSON-schema validator rejects numeric-range keywords (minimum/maximum/exclusiveMinimum/exclusiveMaximum/multipleOf) on integers:

400 invalid_request_error: For 'integer' type, properties maximum, minimum are not supported

_json_spec_from_model serializes a Pydantic model's schema verbatim, and RankedFileScore uses Field(ge=1, le=5) for its surface/influence scores — which Pydantic emits as {"type":"integer","minimum":1,"maximum":5}. So the sourcehunt ranker 400s on every call, leaving 0 files ranked and 0 hunted against the Anthropic adapter.

Fix

Recursively strip those unsupported keywords from the generated schema in _json_spec_from_model before handing it to genai-pyo3. The bounds are advisory for the model anyway (the ranker prompt already states the 1–5 range), and callers can clamp post-parse if strictness is wanted.

Verification

  • New unit tests in tests/test_native_schema_compat.py (top-level + nested/array bounds stripped, non-range structure preserved) — no network.
  • clearwing sourcehunt --depth standard on a small planted-vuln C repo now ranks and hunts correctly (previously: "Ranker LLM call failed", 0 findings; after: both planted bugs found, one with an ASan crash reproduction).

… range keywords

Anthropic's structured-output validator rejects numeric-range keywords
(minimum/maximum/exclusiveMinimum/exclusiveMaximum/multipleOf) on integer
types, returning HTTP 400 "For 'integer' type, properties maximum, minimum
are not supported". Pydantic emits these from Field(ge=, le=), so the
sourcehunt ranker's RankedFileScore (surface/influence scored 1..5) 400s on
every call, leaving 0 files ranked and 0 hunted.

Recursively strip those keywords from the generated schema in
_json_spec_from_model before handing it to genai-pyo3. The bounds are
advisory for the model anyway (the prompt states the 1-5 range).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ropoctl

ropoctl commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Thanks for tracking this down @amsclark — the root-cause writeup is spot on. I went with a slightly different fix in #54: instead of stripping the range keywords from the generated schema, express the score as a Literal[1,2,3,4,5] type alias, which Pydantic serializes to {"type":"integer","enum":[1,2,3,4,5]}. Anthropic's validator supports enum and enforces it via constrained decoding, so the 1..5 bound stays guaranteed rather than advisory, and there's no schema post-processing to carry. RankedFileScore is the only model with numeric bounds, so that covers the whole codebase.

Could you confirm #54 resolves the ranker 400 on your end (the sourcehunt --depth standard repro from the issue)? If it does, I think it supersedes this — but happy to hear if you see a case the type-level approach misses (e.g. bounds coming from a model you don't control, where the generic strip would still be needed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Structured-output schema sends unsupported minimum/maximum keywords — Anthropic 400 kills the ranker

2 participants