Conversation
| finally: | ||
| await self._run_cleanup(checkpoint_storage) | ||
|
|
||
| async def _resume() -> asyncio.Task[None]: # noqa: RUF029 |
There was a problem hiding this comment.
The # noqa: RUF029 comment is unnecessary here. RUF029 warns about unused async functions, but _resume is clearly used on line 634 where it's passed to BackgroundRunHandle. This suppression should be removed.
| async def _resume() -> asyncio.Task[None]: # noqa: RUF029 | |
| async def _resume() -> asyncio.Task[None]: |
| from ._events import WorkflowEvent | ||
| from ._runner_context import RunnerContext | ||
|
|
||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
The logger variable is imported but never used in this module. Consider removing the unused import or adding appropriate debug/error logging where it might be useful (e.g., in the respond method when resuming after idle).
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class BackgroundRunHandle: |
There was a problem hiding this comment.
It feels like this class is essentially a wrapper around asyncio primitives: create_task and an asyncio.Queue. Why do we need to wrap these well-known constructs?
There was a problem hiding this comment.
and this also adds another concept that people have to learn, I do not think this is needed
There was a problem hiding this comment.
I don't agree. We are doing a lot more than just wrapping a workflow in a background task. We are handling events, requests, errors for the users. And the concept is well-known. Here is an equivalent concept from OpenAI: https://developers.openai.com/api/docs/guides/background
There was a problem hiding this comment.
We support background for agents as a keyword arg right now.
| return response_stream | ||
| return response_stream.get_final_response() | ||
|
|
||
| def run_in_background( |
There was a problem hiding this comment.
Do we genuinely need a polling-based consumption pattern? Are we getting feedback that this is missing today?
We're now pushing more concerns onto the caller. Every consumer has to:
- Write the poll loop
- Pick a sleep interval (and get it "wrong", too slow adds latency, too fast wastes cycles)
- Route events by type manually
- Track which request IDs map to which responses
- Remember to drain after idle
- Handle the resume-after-idle edge case
There was a problem hiding this comment.
I very much agree, this is not needed and leads to un-pythonic code
There was a problem hiding this comment.
Why is this un-pythonic?
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class BackgroundRunHandle: |
There was a problem hiding this comment.
and this also adds another concept that people have to learn, I do not think this is needed
| # Single poll loop: process all events and respond to requests inline. | ||
| # The workflow continues running in the background while we process events. | ||
| outputs: list[str] = [] | ||
| while not handle.is_idle: |
There was a problem hiding this comment.
I see almost no value in this, because it makes people have to understand this whole idle, poll, etc. Also a user having to write asyncio.sleep does not seem right, a much simpler pattern to solve for this scenario is to use something like a callback for response required, that's a well known concept in Python and doesn't require as much complexity as this. And when I compare these samples to existing sample that just use streaming then that is enough, that already allows you to do other stuff in the meantime, and if you really need it, a user can call that in a separate thread and then you ahve this with Python primitives instead of another new thing.
There was a problem hiding this comment.
Why is polling and other things not Pythonic? This is a well-known concept in Programming. OpenAI also has a similar primitive: https://developers.openai.com/api/docs/guides/background.
Using callbacks is a simpler approach but it's very limited and doesn't scale well. We had callbacks back in SK and it didn't work really well. For example, it will be difficult to integrate callbacks with a UI because the UI loop is driven by the callback and users can only handle one callback at a time. And if you don't want pre-determined actions, you will have to emit another thing from the callback and wait for user input.
With a background run and events, we are maintaining the original event systems we have in Workflows, which reduces the concepts that people have to learn, given that they are already familiar with polling, which again is a well-known concept.
There was a problem hiding this comment.
So Polling in most other cases, means you actually exit whatever you're doing and then every so often check in to see if it is done, not building this handle.is_idle loop, usually a while True loop suffices, this is also the pattern we have now with background responses for certain api's (openai responses and a2a I think), so adding this handle thing, is the new concept that people have to learn, if we want to do a background run, then it needs to follow such a pattern, where you start it, you get a id of some sort back (continuation token in our case) and then you call run again to see the status. That is also the pattern that openai has on the link that you showed, not with some other thing that needs to be managed. Another concern is can you actually serialize this handler, stop the app, then restart and start polling again, probably not, and we would probably send people to checkpoints if that is a requirement.
Callbacks are a somewhat different approach, but I think the guidance we should give, is that if you can respond directly, then use a callback, if you cannot respond quickly then do not use background, because you wont know when you'll be getting back, and then even a hot path might end up idling for long stretches, which would indicate to that the process you are building should be split into two anyway.
There was a problem hiding this comment.
You don't have to loop on handle.is_idle. That's just the sample. You can totally go do something else in your script and come back later to poll all the events that get produced during that period. You can also do a while True until an output event is produced. The OpenAI response is also a live handle to poll the status while resp.status in {"queued", "in_progress"}:.
Given that we are an SDK, returning an ID isn't a good pattern because we can give users direct access to the background task. If users are creating a service around this, they can map generated IDs to handles and create an API to allow clients to poll using IDs.
Another concern is can you actually serialize this handler, stop the app, then restart and start polling again, probably not, and we would probably send people to checkpoints if that is a requirement.
We already have checkpoints.
if you can respond directly, then use a callback, if you cannot respond quickly then do not use background, because you wont know when you'll be getting back, and then even a hot path might end up idling for long stretches, which would indicate to that the process you are building should be split into two anyway.
This is the hypothetical scenario we are trying to address. If we give this guidance, there is no need for callback either. The current event system will suffice.
| return response_stream | ||
| return response_stream.get_final_response() | ||
|
|
||
| def run_in_background( |
There was a problem hiding this comment.
I very much agree, this is not needed and leads to un-pythonic code
Motivation and Context
Currently, we have a very limited way of handling workflow runs and responding to events. Users have to wait until a workflow converges to process events, such as requests.
Description
Contribution Checklist