Sanitize surrogates and non-UTF-8 bytes in pydantic data converter#1449
Closed
Sanitize surrogates and non-UTF-8 bytes in pydantic data converter#1449
Conversation
6e2564e to
e146a5f
Compare
pydantic_core's Rust serializer crashes on strings with Unicode surrogate characters and bytes with non-UTF-8 content. This adds a fallback that sanitizes the value and retries, preserving all existing serializer behavior including exclude_unset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e146a5f to
91fb2ea
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pydantic_core's Rustto_json()serializer crashes when values contain:errors='surrogateescape'bytesvia UTF-8 decode, so binary data likeb'\x89PNG...'crashesThis was discovered in a real workload where a sandbox exec activity captured stdout from a command that read binary files (PNG headers, fonts, etc.).
Fix
Adds a
_sanitize_for_json()fallback inPydanticJSONPlainPayloadConverter.to_payload():exclude_unsetand all existing behaviorRelated
Test plan
exclude_unsetpreservation, Pydantic models, dataclasses, and nested structures🤖 Generated with Claude Code