fix(llm): sanitize tool_call arguments and cap tool result size for provider validators#468
Open
nap-liu wants to merge 7 commits intodataelement:mainfrom
Open
fix(llm): sanitize tool_call arguments and cap tool result size for provider validators#468nap-liu wants to merge 7 commits intodataelement:mainfrom
nap-liu wants to merge 7 commits intodataelement:mainfrom
Conversation
added 7 commits
April 23, 2026 16:25
Pure-Python helper that parses malformed tool_call.arguments strings (trailing commas, unescaped control chars) and produces a canonical JSON string guaranteed to round-trip through json.loads. Will be used by caller._process_tool_call to prevent malformed JSON from leaking into subsequent DashScope requests.
Two silent-data-loss paths found in code review:
1. Regex-based trailing-comma strip would eat commas inside string values
({"a": "hello,}", ...} -> {"a": "hello}", ...}). Replaced with a
string-aware walker that tracks in_string and escape state.
2. Non-dict top-level parses (array, scalar, null) were silently coerced to
{} with method="clean", making the data-loss event invisible to
observability. Added explicit method="non_dict_coerced" so the caller
can log-and-alert when this happens.
Tests: 3 new (string-content comma, escaped quote, non-dict coercion).
Total: 12 passed.
Large tool results (e.g. execute_code stdout) accumulate across rounds and can push conversation context over Qwen3.5-plus's 983k char input limit. shape_tool_result keeps head + tail with an explicit truncation marker so the LLM knows truncation happened.
Two silent-corruption edge cases found in code review: 1. Negative max_chars (e.g. -5) produced nonsense via Python's negative slice semantics — overlapping slices + a lying marker claiming more chars dropped than the input had. 2. max_chars=0 returned only the marker, violating the contract that output stays within max_chars + small marker overhead. Both now return empty string with was_truncated accurately reflecting whether any content was dropped. Tests added for both cases plus renamed test_marker_reports_dropped_byte_count → ..._char_count since len(str) counts code points, not UTF-8 bytes.
_process_tool_call now uses canonicalize_tool_arguments to repair malformed JSON from Qwen streaming (trailing commas, unescaped control chars) and writes the canonical form back to tc.function.arguments so that subsequent LLM rounds receive a history entry that passes DashScope's JSON validator. Prevents HTTP 400 'function.arguments must be in JSON format'. Also caps single tool result content at 20k chars via head+tail shaping. Prevents single-run context blowup that caused HTTP 400 'Range of input length should be [1, 983616]'. Both paths log a structured warning when repair or truncation fires, for observability.
Code review found call_agent_llm_with_tools._try_model has an identical tool loop to _process_tool_call, used by background services (scheduler, task executor, heartbeat). It still ran the naive json.loads path without canonicalize or shaping, leaving the same two 400 errors unfixed on the background-service code path. Extracted shared helpers _canonicalize_tc_arguments and _shape_tool_content_for_context at module level so both callers use the same normalization + logging. Also: - Hoisted json_recovery and tool_result_shaping imports to top of file - Promoted TOOL_RESULT_MAX_CHARS to module constant - Unified log fields (orig_len/new_len) and added dropped= on truncation - Tightened test assertion from <50_000 to <=TOOL_RESULT_MAX_CHARS+500 - Added 3 direct tests for the helpers
Three small items surfaced by final integrated review: 1. _try_model declared -> tuple[str, bool, bool] but the no-tool-calls happy-path early return produced a 2-tuple. Caller unpacked 3 values so the background-services path would ValueError on any immediate non-tool answer. Pre-existing; folded in here since this PR already edits the function. 2. Added a one-line NB comment above both assistant-with-tool-calls message appends documenting that tc['function'] must remain shared by reference with _canonicalize_tc_arguments in-place write — a future deepcopy refactor would silently carry pre-repair arguments into history. 3. Removed dead 'import re' and '_UNESCAPED_CONTROL_RE' from json_recovery.py — unused after the string-aware walker rewrite.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Addresses #401 — partial. Implements the 输出清洗 (output sanitization) pillar; retry/降级 pillars intentionally out of scope and left for follow-up.
Summary
Two classes of HTTP 400 failures that surface intermittently when agents use tool calls against providers that validate
function.argumentsas strict JSON (DashScope/Qwen observed here; the issue references MiniMax with the same root cause):function.arguments must be in JSON format— model streaming occasionally emits tool_callargumentswith trailing commas or unescaped control characters. Current_process_tool_callcallsjson.loadswith a silent fallback to{}, but the raw malformed string is still written back into conversation history, so every subsequent round re-submits it and fails the validator.Range of input length should be [1, 983616]— large tool results (e.g.execute_codestdout, long query responses) accumulate across rounds and push the full conversation past the model's input cap.Fix
json_recovery.py(new): string-aware JSON repair that strips trailing commas and unescapes control chars without corrupting data inside string literals. Returns a canonical form guaranteed to round-trip throughjson.loads.tool_result_shaping.py(new): head+tail truncation with an explicit marker so the LLM knows content was dropped.caller.py:_process_tool_calland_try_model(called bycall_agent_llm_with_tools) both canonicalize arguments back intotc["function"]["arguments"]and shape results toTOOL_RESULT_MAX_CHARS(20k). Shared helpers_canonicalize_tc_arguments/_shape_tool_content_for_contextavoid duplication. Pre-existing tuple-arity bug in_try_model's early return is also fixed.Tests
test_llm_json_recovery.py(+118): trailing-comma strip, non-dict coercion, unicode handling, edge casestest_llm_tool_result_shaping.py(+80): head/tail composition, truncation marker accuracy, zero/negative max_chars guardstest_llm_caller_integration.py(+140): in-place canonicalization via tool_call fixtures, shaping at the caller boundary, logging contractNotes
list[dict]) passes shaping untouched — base64 image payloads preserved._canonicalize_tc_argumentsmutatestc["function"]in place by design so the repaired args reach history; a comment locks this invariant in case of a future deepcopy refactor.TOOL_RESULT_MAX_CHARS = 20_000is a conservative constant for this PR; making it per-agent configurable is a natural follow-up.Commits
Kept granular (7 commits) for easier review. Happy to squash on merge if preferred.