Skip to content

Python: Prevent WorkflowExecutor from re-sending answered requests after checkpoint restore#3689

Merged
TaoChenOSU merged 1 commit intomicrosoft:mainfrom
TaoChenOSU:taochen/python-fix-subworkflow-request
Feb 5, 2026
Merged

Python: Prevent WorkflowExecutor from re-sending answered requests after checkpoint restore#3689
TaoChenOSU merged 1 commit intomicrosoft:mainfrom
TaoChenOSU:taochen/python-fix-subworkflow-request

Conversation

@TaoChenOSU
Copy link
Contributor

@TaoChenOSU TaoChenOSU commented Feb 5, 2026

Motivation and Context

Supersedes #3293
Closes #3255

Description

The root cause that a subworkflow would re-emit a request info event upon resuming from a checkpoint that contains the pending event is that the event from the checkpoint would get added back to the event queue of the subworkflow, and because of the fact that the runner drains and emits events at the very beginning of a superstep, we create duplicated request info events.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

@TaoChenOSU TaoChenOSU self-assigned this Feb 5, 2026
Copilot AI review requested due to automatic review settings February 5, 2026 06:49
@github-actions github-actions bot changed the title Prevent WorkflowExecutor from re-sending answered requests after checkpoint restore Python: Prevent WorkflowExecutor from re-sending answered requests after checkpoint restore Feb 5, 2026
@markwallace-microsoft
Copy link
Member

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework/_workflows
   _workflow_executor.py1772884%96, 445, 469, 471, 479–480, 485, 487, 492, 494, 547, 573–578, 581, 584, 592, 597, 608, 618, 622, 628, 632, 642, 646
TOTAL16366190788% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
3992 221 💤 0 ❌ 0 🔥 1m 8s ⏱️

@TaoChenOSU TaoChenOSU enabled auto-merge February 5, 2026 06:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where WorkflowExecutor would re-send already-answered RequestInfoEvents after checkpoint restore, causing duplicate requests and incorrect expected_response_count values that led to workflows hanging or throwing errors.

Changes:

  • Added filtering logic to remove already-handled requests from workflow results after checkpoint restore
  • Added comprehensive test to verify the fix

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
python/packages/core/agent_framework/_workflows/_workflow_executor.py Implemented fix to filter out already-handled RequestInfoEvents from result before processing, preventing duplicate requests after checkpoint restore
python/packages/core/tests/workflow/test_sub_workflow.py Added test case test_sub_workflow_checkpoint_restore_no_duplicate_requests with helper classes to verify that duplicate requests are not emitted after checkpoint restore

@moonbox3 moonbox3 moved this to In Review in Agent Framework Feb 5, 2026
Copy link
Contributor

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing.

One thought: in apply_checkpoint, we don't have to add already-pending events to the event queue, right? Instead, we provide a separate explicit method like emit_pending_requests() that the parent workflow can call when it wants to be notified of outstanding requests? Is this something we should look at soon?

@TaoChenOSU
Copy link
Contributor Author

LGTM, thanks for fixing.

One thought: in apply_checkpoint, we don't have to add already-pending events to the event queue, right? Instead, we provide a separate explicit method like emit_pending_requests() that the parent workflow can call when it wants to be notified of outstanding requests? Is this something we should look at soon?

We have to add the event back to queue, just in case a response isn't provided, we can re-emit the event. The current issue is that the event is re-emitted before we process the response.

The parent workflow doesn't know if an executor is a subworkflow. How can we achieve that?

@TaoChenOSU TaoChenOSU added this pull request to the merge queue Feb 5, 2026
Merged via the queue into microsoft:main with commit d120589 Feb 5, 2026
30 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in Agent Framework Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: WorkflowExecutor re-sends already-answered RequestInfoEvents after checkpoint restore

5 participants