Python: Normalize OpenAI function-call arguments at parse time to prevent uni… by 0x7c13 · Pull Request #4831 · microsoft/agent-framework

0x7c13 · 2026-03-22T08:32:46Z

Python: Normalize OpenAI function-call arguments at parse time to prevent unicode escape corruption

Problem

When an LLM-powered agent edits source files containing Python/JavaScript unicode escape sequences like \u2192, the OpenAI code path corrupts these sequences due to double JSON parsing.

Root cause

The Anthropic and OpenAI backends handle function-call arguments differently:

Anthropic: Returns content_block.input as a parsed dict. Stored directly — parse_arguments() returns it as-is. 1 JSON parse total.
OpenAI: Returns tool.function.arguments as a raw JSON string. Stored as a string, then parse_arguments() calls json.loads() again. 2 JSON parses total.

The second json.loads() re-interprets \uXXXX sequences as JSON unicode escapes, corrupting the original intent:

# A source file contains the Python escape: \u2192
# The model correctly generates \\u2192 in its JSON arguments

# Anthropic path (1 parse):
content_block.input = {"old_string": "\\u2192"}  # SDK parsed → \u2192 ✓

# OpenAI path (2 parses):
tool.function.arguments = '{"old_string": "\\u2192"}'  # stored as string
json.loads(arguments)    → {"old_string": "→"}          # \u2192 interpreted as unicode escape ✗

The same model output that works correctly on Anthropic produces a corrupted value on OpenAI. The \u2192 (literal 6-char Python escape) becomes → (a single Unicode character), causing edit_file to either fail to match or write incorrect content.

Impact

This affects any tool that reads/writes source code containing \uXXXX escape sequences (Python, JavaScript, Java, C#, JSON). In practice, agents enter retry loops (10+ failed edit_file attempts observed) trying different escaping levels, wasting tokens and often ultimately writing corrupted code.

What changed

Added normalize_function_call_arguments() helper in _types.py that eagerly parses JSON-string arguments into dicts at the provider-parsing layer
Applied normalization in OpenAIChatClient._parse_tool_calls_from_openai() and three non-streaming parse sites in OpenAIResponsesClient
Updated _prepare_content_for_openai() in the responses client to re-serialize dict arguments back to JSON strings when sending to the API (the chat client already handled this at line 704)
Updated 2 test assertions that expected raw string arguments to expect parsed dicts

Streaming deltas (response.function_call_arguments.delta) are intentionally not normalized since they contain partial JSON fragments.

Validation

uv run python -m pytest packages/core/tests/openai/test_openai_chat_client.py \
  packages/core/tests/openai/test_openai_responses_client.py \
  -m "not integration" -q

All 183 tests pass.

Before / After comparison

from agent_framework._types import normalize_function_call_arguments

# Model generates \\u2192 in its JSON output — the correct escaping for literal \u2192
args = '{"old_string": "\\\\u2192"}'

# BEFORE: stored as string, then double-parsed
import json
json.loads(args)["old_string"]   # → '\\u2192' (2 backslashes — wrong)

# AFTER: normalized once at parse time, parse_arguments() returns dict directly
normalize_function_call_arguments(args)["old_string"]  # → '\\u2192' (same parse)
# Then parse_arguments() sees a Mapping and returns it — no second json.loads

The fix makes the OpenAI path behave identically to the Anthropic path: arguments are parsed once and stored as a dict. parse_arguments() returns the dict directly without a second json.loads() call.

Python: Normalize provider tool-call argument envelopes across chat backends #4740 / Python: Normalize OpenAI tool-call argument envelopes on parse #4741 — Same problem space (OpenAI argument envelope normalization), but focused on the string-vs-dict type inconsistency rather than the unicode escape corruption specifically.

…code escape corruption

0x7c13 · 2026-03-22T08:37:55Z

@microsoft-github-policy-service agree

eavanvalkenburg · 2026-03-23T08:36:07Z

So I had a quick look, and some things caught my eye:

this introduces a different behavior between streaming and non-streaming which I don't particularly like
this leads to redumping of values for each call, introducing new latency for the regular path (because we have to do json.dumps(content.arguments) instead of content.arguments for openai
the default behavior across providers is to return arguments as strings, anthropic is the exception and that is handled now by the logic in parse_arguments
I don't see tests that actually demonstrate that this is a fix

So I'm really still not sure what the actual issue is, so please first create a issue clearly describing that arguments cause failures for openai, and then we can determine the best way to fix.

0x7c13 · 2026-03-23T12:28:26Z

Thanks for the review @eavanvalkenburg . After digging deeper into the escaping mechanics, I need to correct my original analysis —
the unicode escape corruption narrative in the PR description is wrong.

I traced the full pipeline byte-by-byte and confirmed that normalize_function_call_arguments calls json.loads on the exact same string that
parse_arguments() would — the parsed values are identical regardless of when the parse happens. Both Anthropic and OpenAI produce the same
result for the same model output:

from agent_framework import Content
import json

args_str = '{"old_string": "\\u2192"}' # what SDK gives us

Without normalize: parse_arguments does json.loads

c1 = Content.from_function_call(call_id="a", name="f", arguments=args_str)
r1 = c1.parse_arguments()["old_string"] # → '\u2192' (correct)

With normalize: json.loads at creation, parse_arguments returns as-is

normalized = json.loads(args_str) # same json.loads, same result
c2 = Content.from_function_call(call_id="b", name="f", arguments=normalized)
r2 = c2.parse_arguments()["old_string"] # → '\u2192' (same)

assert r1 == r2 # always True — normalize doesn't change the value

Your points are all valid. I'll close this PR.

Normalize OpenAI function-call arguments at parse time to prevent uni…

c29da11

…code escape corruption

markwallace-microsoft added the python label Mar 22, 2026

github-actions bot changed the title ~~Normalize OpenAI function-call arguments at parse time to prevent uni…~~ Python: Normalize OpenAI function-call arguments at parse time to prevent uni… Mar 22, 2026

0x7c13 mentioned this pull request Mar 22, 2026

Python: [Bug]: Normalize OpenAI function-call arguments at parse time to prevent unicode escape corruption #4832

Closed

0x7c13 closed this Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Normalize OpenAI function-call arguments at parse time to prevent uni…#4831

Python: Normalize OpenAI function-call arguments at parse time to prevent uni…#4831
0x7c13 wants to merge 1 commit intomicrosoft:mainfrom
0x7c13:dev/0x7c13/unicode_corruption_fix

0x7c13 commented Mar 22, 2026

Uh oh!

0x7c13 commented Mar 22, 2026

Uh oh!

eavanvalkenburg commented Mar 23, 2026

Uh oh!

0x7c13 commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

0x7c13 commented Mar 22, 2026

Python: Normalize OpenAI function-call arguments at parse time to prevent unicode escape corruption

Problem

Root cause

Impact

What changed

Validation

Before / After comparison

Related

Uh oh!

0x7c13 commented Mar 22, 2026

Uh oh!

eavanvalkenburg commented Mar 23, 2026

Uh oh!

0x7c13 commented Mar 23, 2026

Without normalize: parse_arguments does json.loads

With normalize: json.loads at creation, parse_arguments returns as-is

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants