Python: Normalize OpenAI function-call arguments at parse time to prevent uni…#4831
Python: Normalize OpenAI function-call arguments at parse time to prevent uni…#48310x7c13 wants to merge 1 commit intomicrosoft:mainfrom
Conversation
…code escape corruption
|
@microsoft-github-policy-service agree |
|
So I had a quick look, and some things caught my eye:
So I'm really still not sure what the actual issue is, so please first create a issue clearly describing that arguments cause failures for openai, and then we can determine the best way to fix. |
|
Thanks for the review @eavanvalkenburg . After digging deeper into the escaping mechanics, I need to correct my original analysis — I traced the full pipeline byte-by-byte and confirmed that normalize_function_call_arguments calls json.loads on the exact same string that from agent_framework import Content args_str = '{"old_string": "\\u2192"}' # what SDK gives us Without normalize: parse_arguments does json.loadsc1 = Content.from_function_call(call_id="a", name="f", arguments=args_str) With normalize: json.loads at creation, parse_arguments returns as-isnormalized = json.loads(args_str) # same json.loads, same result assert r1 == r2 # always True — normalize doesn't change the value Your points are all valid. I'll close this PR. |
Python: Normalize OpenAI function-call arguments at parse time to prevent unicode escape corruption
Problem
When an LLM-powered agent edits source files containing Python/JavaScript unicode escape sequences like
\u2192, the OpenAI code path corrupts these sequences due to double JSON parsing.Root cause
The Anthropic and OpenAI backends handle function-call arguments differently:
content_block.inputas a parsed dict. Stored directly —parse_arguments()returns it as-is. 1 JSON parse total.tool.function.argumentsas a raw JSON string. Stored as a string, thenparse_arguments()callsjson.loads()again. 2 JSON parses total.The second
json.loads()re-interprets\uXXXXsequences as JSON unicode escapes, corrupting the original intent:The same model output that works correctly on Anthropic produces a corrupted value on OpenAI. The
\u2192(literal 6-char Python escape) becomes→(a single Unicode character), causingedit_fileto either fail to match or write incorrect content.Impact
This affects any tool that reads/writes source code containing
\uXXXXescape sequences (Python, JavaScript, Java, C#, JSON). In practice, agents enter retry loops (10+ failededit_fileattempts observed) trying different escaping levels, wasting tokens and often ultimately writing corrupted code.What changed
normalize_function_call_arguments()helper in_types.pythat eagerly parses JSON-string arguments into dicts at the provider-parsing layerOpenAIChatClient._parse_tool_calls_from_openai()and three non-streaming parse sites inOpenAIResponsesClient_prepare_content_for_openai()in the responses client to re-serialize dict arguments back to JSON strings when sending to the API (the chat client already handled this at line 704)Streaming deltas (
response.function_call_arguments.delta) are intentionally not normalized since they contain partial JSON fragments.Validation
uv run python -m pytest packages/core/tests/openai/test_openai_chat_client.py \ packages/core/tests/openai/test_openai_responses_client.py \ -m "not integration" -qAll 183 tests pass.
Before / After comparison
The fix makes the OpenAI path behave identically to the Anthropic path: arguments are parsed once and stored as a dict.
parse_arguments()returns the dict directly without a secondjson.loads()call.Related