Python: Workflows background run by TaoChenOSU · Pull Request #4274 · microsoft/agent-framework

TaoChenOSU · 2026-02-25T21:32:32Z

Motivation and Context

Currently, we have a very limited way of handling workflow runs and responding to events. Users have to wait until a workflow converges to process events, such as requests.

Description

Create a run mode where clients can start a workflow run in the background and poll events as they desire.
Allow clients to respond to requests while the workflow is still running in the background.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Copilot · 2026-02-25T21:44:45Z

python/packages/core/agent_framework/_workflows/_workflow.py

+            finally:
+                await self._run_cleanup(checkpoint_storage)
+
+        async def _resume() -> asyncio.Task[None]:  # noqa: RUF029


The # noqa: RUF029 comment is unnecessary here. RUF029 warns about unused async functions, but _resume is clearly used on line 634 where it's passed to BackgroundRunHandle. This suppression should be removed.

Suggested change

async def _resume() -> asyncio.Task[None]: # noqa: RUF029

async def _resume() -> asyncio.Task[None]:

Copilot · 2026-02-25T21:44:46Z

python/packages/core/agent_framework/_workflows/_background_run.py

+from ._events import WorkflowEvent
+from ._runner_context import RunnerContext
+
+logger = logging.getLogger(__name__)


The logger variable is imported but never used in this module. Consider removing the unused import or adding appropriate debug/error logging where it might be useful (e.g., in the respond method when resuming after idle).

moonbox3

Some initial thoughts.

moonbox3 · 2026-02-25T22:43:13Z

python/packages/core/agent_framework/_workflows/_background_run.py

+logger = logging.getLogger(__name__)
+
+
+class BackgroundRunHandle:


It feels like this class is essentially a wrapper around asyncio primitives: create_task and an asyncio.Queue. Why do we need to wrap these well-known constructs?

and this also adds another concept that people have to learn, I do not think this is needed

I don't agree. We are doing a lot more than just wrapping a workflow in a background task. We are handling events, requests, errors for the users. And the concept is well-known. Here is an equivalent concept from OpenAI: https://developers.openai.com/api/docs/guides/background

We support background for agents as a keyword arg right now.

moonbox3 · 2026-02-25T22:50:57Z

python/packages/core/agent_framework/_workflows/_workflow.py

            return response_stream
        return response_stream.get_final_response()

+    def run_in_background(


Do we genuinely need a polling-based consumption pattern? Are we getting feedback that this is missing today?

We're now pushing more concerns onto the caller. Every consumer has to:

Write the poll loop

Pick a sleep interval (and get it "wrong", too slow adds latency, too fast wastes cycles)

Route events by type manually

Track which request IDs map to which responses

Remember to drain after idle

Handle the resume-after-idle edge case

I very much agree, this is not needed and leads to un-pythonic code

Why is this un-pythonic?

eavanvalkenburg · 2026-02-26T08:35:10Z

python/packages/core/agent_framework/_workflows/_background_run.py

+logger = logging.getLogger(__name__)
+
+
+class BackgroundRunHandle:


and this also adds another concept that people have to learn, I do not think this is needed

eavanvalkenburg · 2026-02-26T08:46:11Z

python/samples/03-workflows/background-run/respond_while_running.py

+    # Single poll loop: process all events and respond to requests inline.
+    # The workflow continues running in the background while we process events.
+    outputs: list[str] = []
+    while not handle.is_idle:


I see almost no value in this, because it makes people have to understand this whole idle, poll, etc. Also a user having to write asyncio.sleep does not seem right, a much simpler pattern to solve for this scenario is to use something like a callback for response required, that's a well known concept in Python and doesn't require as much complexity as this. And when I compare these samples to existing sample that just use streaming then that is enough, that already allows you to do other stuff in the meantime, and if you really need it, a user can call that in a separate thread and then you ahve this with Python primitives instead of another new thing.

Why is polling and other things not Pythonic? This is a well-known concept in Programming. OpenAI also has a similar primitive: https://developers.openai.com/api/docs/guides/background.

Using callbacks is a simpler approach but it's very limited and doesn't scale well. We had callbacks back in SK and it didn't work really well. For example, it will be difficult to integrate callbacks with a UI because the UI loop is driven by the callback and users can only handle one callback at a time. And if you don't want pre-determined actions, you will have to emit another thing from the callback and wait for user input.

With a background run and events, we are maintaining the original event systems we have in Workflows, which reduces the concepts that people have to learn, given that they are already familiar with polling, which again is a well-known concept.

So Polling in most other cases, means you actually exit whatever you're doing and then every so often check in to see if it is done, not building this handle.is_idle loop, usually a while True loop suffices, this is also the pattern we have now with background responses for certain api's (openai responses and a2a I think), so adding this handle thing, is the new concept that people have to learn, if we want to do a background run, then it needs to follow such a pattern, where you start it, you get a id of some sort back (continuation token in our case) and then you call run again to see the status. That is also the pattern that openai has on the link that you showed, not with some other thing that needs to be managed. Another concern is can you actually serialize this handler, stop the app, then restart and start polling again, probably not, and we would probably send people to checkpoints if that is a requirement.

Callbacks are a somewhat different approach, but I think the guidance we should give, is that if you can respond directly, then use a callback, if you cannot respond quickly then do not use background, because you wont know when you'll be getting back, and then even a hot path might end up idling for long stretches, which would indicate to that the process you are building should be split into two anyway.

You don't have to loop on handle.is_idle. That's just the sample. You can totally go do something else in your script and come back later to poll all the events that get produced during that period. You can also do a while True until an output event is produced. The OpenAI response is also a live handle to poll the status while resp.status in {"queued", "in_progress"}:.

Given that we are an SDK, returning an ID isn't a good pattern because we can give users direct access to the background task. If users are creating a service around this, they can map generated IDs to handles and create an API to allow clients to poll using IDs.

Another concern is can you actually serialize this handler, stop the app, then restart and start polling again, probably not, and we would probably send people to checkpoints if that is a requirement.

We already have checkpoints.

if you can respond directly, then use a callback, if you cannot respond quickly then do not use background, because you wont know when you'll be getting back, and then even a hot path might end up idling for long stretches, which would indicate to that the process you are building should be split into two anyway.

This is the hypothetical scenario we are trying to address. If we give this guidance, there is no need for callback either. The current event system will suffice.

eavanvalkenburg · 2026-02-26T08:47:03Z

python/packages/core/agent_framework/_workflows/_workflow.py

            return response_stream
        return response_stream.get_final_response()

+    def run_in_background(


I very much agree, this is not needed and leads to un-pythonic code

TaoChenOSU added 5 commits February 25, 2026 11:28

Create workflow background run API

f10df36

Add samples

fb6809e

Add comments

daa60f9

Add response handling to background run handle

7d4f9b1

Add sample

5974967

TaoChenOSU self-assigned this Feb 25, 2026

TaoChenOSU added this to Agent Framework Feb 25, 2026

TaoChenOSU added python workflows Related to Workflows in agent-framework labels Feb 25, 2026

github-actions bot changed the title ~~Workflows background run~~ Python: Workflows background run Feb 25, 2026

markwallace-microsoft added the documentation Improvements or additions to documentation label Feb 25, 2026

Add comments

015c5df

TaoChenOSU marked this pull request as ready for review February 25, 2026 21:38

Copilot AI review requested due to automatic review settings February 25, 2026 21:38

TaoChenOSU moved this to In Progress in Agent Framework Feb 25, 2026

Copilot started reviewing on behalf of TaoChenOSU February 25, 2026 21:39 View session

TaoChenOSU mentioned this pull request Feb 25, 2026

Python: Add request_handlers parameter for automatic HITL request handling #3842

Open

4 tasks

Copilot AI reviewed Feb 25, 2026

View reviewed changes

moonbox3 reviewed Feb 25, 2026

View reviewed changes

eavanvalkenburg reviewed Feb 26, 2026

View reviewed changes

TaoChenOSU moved this from In Progress to In Review in Agent Framework Feb 26, 2026

	async def _resume() -> asyncio.Task[None]: # noqa: RUF029
	async def _resume() -> asyncio.Task[None]:

		logger = logging.getLogger(__name__)


		class BackgroundRunHandle:

Conversation

TaoChenOSU commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

Contribution Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

TaoChenOSU commented Feb 25, 2026 •

edited

Loading