Python: Prevent WorkflowExecutor from re-sending answered requests after checkpoint restore#3689
Conversation
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where WorkflowExecutor would re-send already-answered RequestInfoEvents after checkpoint restore, causing duplicate requests and incorrect expected_response_count values that led to workflows hanging or throwing errors.
Changes:
- Added filtering logic to remove already-handled requests from workflow results after checkpoint restore
- Added comprehensive test to verify the fix
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
python/packages/core/agent_framework/_workflows/_workflow_executor.py |
Implemented fix to filter out already-handled RequestInfoEvents from result before processing, preventing duplicate requests after checkpoint restore |
python/packages/core/tests/workflow/test_sub_workflow.py |
Added test case test_sub_workflow_checkpoint_restore_no_duplicate_requests with helper classes to verify that duplicate requests are not emitted after checkpoint restore |
moonbox3
left a comment
There was a problem hiding this comment.
LGTM, thanks for fixing.
One thought: in apply_checkpoint, we don't have to add already-pending events to the event queue, right? Instead, we provide a separate explicit method like emit_pending_requests() that the parent workflow can call when it wants to be notified of outstanding requests? Is this something we should look at soon?
We have to add the event back to queue, just in case a response isn't provided, we can re-emit the event. The current issue is that the event is re-emitted before we process the response. The parent workflow doesn't know if an executor is a subworkflow. How can we achieve that? |
Motivation and Context
Supersedes #3293
Closes #3255
Description
The root cause that a subworkflow would re-emit a request info event upon resuming from a checkpoint that contains the pending event is that the event from the checkpoint would get added back to the event queue of the subworkflow, and because of the fact that the runner drains and emits events at the very beginning of a superstep, we create duplicated request info events.
Contribution Checklist