Python: Fix Responses API handoff state handling and add focused tests#4057
Python: Fix Responses API handoff state handling and add focused tests#4057alliscode wants to merge 2 commits intomicrosoft:mainfrom
Conversation
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This pull request fixes two critical bugs in the HandoffBuilder's handling of Responses API-style clients, ensuring conversation context and session state are correctly managed across agent handoffs. The fixes prevent "No tool output found" API errors and restore complete conversation history to agents after handoffs.
Changes:
- Fixed stale
previous_response_idissue by clearingservice_session_idafter handoffs - Restored full conversation context by passing
_full_conversationto agents instead of partial_cache - Added comprehensive regression tests with Responses API mock to prevent future regressions
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
python/packages/orchestrations/tests/test_handoff_responses.py |
New test suite with mock Responses API client to verify handoff invariants: session clearing and context preservation |
python/packages/orchestrations/agent_framework_orchestrations/_handoff.py |
Two-line fix: copy full conversation to cache before agent runs, and clear service_session_id after handoffs |
| # from being sent on the next run. The handoff response contained a function_call | ||
| # for the handoff tool; referencing it via previous_response_id after the tool | ||
| # output has been cleaned would cause "No tool output found" API errors. | ||
| if self._session and self._session.service_session_id: |
There was a problem hiding this comment.
Is this like saying each time it will create a new session in the service?
|
I've got these same fixes in #3911, plus others that I found through testing workflows with handoff + AG-UI. We could add the regression tests from yours to 3911. |
|
After a discussion offline, this issue is being fixed with a separate PR. Closing this. |
This pull request addresses critical issues with agent handoff behavior when using Responses API-style clients, ensuring conversation context and session state are correctly managed across handoffs. It introduces regression tests to verify these invariants and updates the orchestration logic to prevent context loss and stale session IDs that could cause API errors.
Bug fixes for handoff and session management:
_full_conversation) as input after a handoff, rather than just the latest broadcast, to preserve context for APIs like the Responses API.service_session_idafter a handoff to prevent sending a staleprevious_response_id, which could otherwise cause "No tool output found" errors with the Responses API.Testing and regression coverage:
test_handoff_responses.py) with regression tests to verify that (1) handoffs correctly clear the session's conversation pointer and (2) agents receive the complete conversation context after a handoff. This includes a mock client and agent simulating Responses API behavior.Closes #4053
Contribution Checklist