fix(streaming): handle duplicate-index list entries in accumulate_delta#3227
fix(streaming): handle duplicate-index list entries in accumulate_delta#3227rmotgi1227 wants to merge 2 commits into
Conversation
When the very first streaming chunk for a key contains a list value (e.g. tool_calls), the early-return branches in accumulate_delta copied the raw list wholesale instead of routing it through the index-based merge path. This caused duplicate-index entries in that first chunk to be stored as separate list elements rather than merged, leaving orphaned entries that subsequent chunks could not reach. Fix: when key is absent or acc_value is None and delta_value is a list, initialize acc[key] to [] and fall through to the existing list-merge logic so that every chunk — including the first — goes through proper index deduplication. Fixes openai#3203, see also openai#3201.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be37dee88a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if is_list(delta_value): | ||
| # Initialize an empty list so that list-merge logic below | ||
| # handles duplicate-index entries even on the very first chunk. | ||
| acc[key] = [] |
There was a problem hiding this comment.
Fix first list chunk duplicate-index merging
When delta_value is a list for a key that is not yet in acc, this initializes acc_value to an empty list, but the existing list branch treats all(isinstance(... ) for x in acc_value) as true for an empty list and just extends the raw delta. As a result, the exact duplicate-index case this commit is meant to fix still leaves two entries at index 0 for inputs like {'tool_calls': [{'index': 0, ...}, {'index': 0, ...}]} instead of merging them, so later chunks continue from a malformed snapshot.
Useful? React with 👍 / 👎.
Python's all() returns True on an empty sequence, so initializing acc[key] = [] and then hitting the `all(isinstance(...) for x in acc_value)` check would evaluate to True and call extend() — dumping the entire raw list in wholesale, same bug as before. Adding `acc_value and` before the all() check means an empty list skips the extend shortcut and falls through to the per-index merge loop instead.
There's a bug in accumulate_delta where if the very first streaming chunk contains a list value (like tool_calls), the function copies it wholesale into acc and returns early — completely skipping the index-based merge logic. That means if the first chunk happens to have two entries at the same index (which the OpenAI spec allows), they end up as two separate list items instead of being merged. Subsequent chunks then only ever touch index 0, leaving the duplicate entry orphaned and causing tool_calls[0].function.arguments to be missing a chunk and come out as invalid JSON.
The fix is straightforward: in the two early-return branches ("key not in acc" and "acc_value is None"), check whether delta_value is a list before doing the wholesale copy. If it is, initialize acc[key] to an empty list and let execution fall through to the existing per-index merge loop, the same path that all subsequent chunks already go through. Non-list values still take the original fast path.
The change is in src/openai/lib/streaming/_deltas.py only. The same early-return pattern exists in _assistants.py but that file has different context so I've left it for a separate follow-up if needed.
Fixes #3203. See also #3201 which covers a narrower version of the same bug.