Skip to content

Python: fix: filter non-assistant messages from AgentResponse in WorkflowAgent#4275

Open
LEDazzio01 wants to merge 3 commits intomicrosoft:mainfrom
LEDazzio01:fix/4261-workflow-agent-user-input-reemission
Open

Python: fix: filter non-assistant messages from AgentResponse in WorkflowAgent#4275
LEDazzio01 wants to merge 3 commits intomicrosoft:mainfrom
LEDazzio01:fix/4261-workflow-agent-user-input-reemission

Conversation

@LEDazzio01
Copy link
Contributor

Summary

Fixes #4261

When a workflow (e.g., GroupChat) emits an AgentResponse containing the full conversation history (user + assistant messages), WorkflowAgent was converting all messages to output — including user messages. This caused user inputs to be re-emitted as part of the assistant response, with the problem compounding across successive turns as the conversation history grew.

Root Cause

In _workflows/_agent.py, both the streaming path (_convert_workflow_event_to_agent_response_updates) and the non-streaming path (_convert_workflow_events_to_agent_response) iterated over all messages in AgentResponse.messages without filtering by role. When GroupChat-style orchestrators emit their full conversation history as an AgentResponse, this meant user messages were surfaced as assistant output.

Fix

Filter AgentResponse.messages to only include messages with role="assistant" before converting them to output:

Streaming path (_convert_workflow_event_to_agent_response_updates):

for msg in data.messages:
    if msg.role != "assistant":
        continue
    updates.append(...)

Non-streaming path (_convert_workflow_events_to_agent_response):

if isinstance(data, AgentResponse):
    assistant_messages = [msg for msg in data.messages if msg.role == "assistant"]
    messages.extend(assistant_messages)

Tests Added

Added TestWorkflowAgentUserInputFiltering class with 4 test cases:

  • Streaming: Verifies only assistant messages from a mixed-role AgentResponse appear in streaming output
  • Non-streaming: Same verification for non-streaming path
  • All-assistant (regression): Verifies no change in behavior when AgentResponse only contains assistant messages
  • Empty after filtering: Verifies correct behavior when AgentResponse contains only user messages

Scope

This fix targets only the AgentResponse branches. The Message and list[Message] branches are intentionally left unchanged, as those represent explicit workflow output where the executor has control over what it yields (and existing tests validate this behavior).

When a workflow (e.g., GroupChat) emits an AgentResponse containing the
full conversation history (user + assistant messages), WorkflowAgent was
converting ALL messages to output, causing user inputs to be re-emitted
as assistant responses. This was visible as accumulated user inputs in
each successive response.

Fix: filter AgentResponse.messages to only include role="assistant"
messages in both _convert_workflow_event_to_agent_response_updates
(streaming) and _convert_workflow_events_to_agent_response
(non-streaming).

Fixes microsoft#4261
…4261)

Add TestWorkflowAgentUserInputFiltering class with tests verifying that:
- Streaming filters user messages from AgentResponse data
- Non-streaming filters user messages from AgentResponse data
- All-assistant AgentResponse works unchanged (no regression)
- All-user AgentResponse produces no updates
Copilot AI review requested due to automatic review settings February 25, 2026 21:32
@github-actions github-actions bot changed the title fix: filter non-assistant messages from AgentResponse in WorkflowAgent Python: fix: filter non-assistant messages from AgentResponse in WorkflowAgent Feb 25, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent WorkflowAgent from re-emitting user inputs when a workflow outputs an AgentResponse that contains full conversation history, by filtering AgentResponse.messages to role="assistant" before converting to streamed updates or non-streamed results.

Changes:

  • Filter non-assistant messages out of AgentResponse.messages in the non-streaming conversion path.
  • Filter non-assistant messages out of AgentResponse.messages in the streaming conversion path.
  • Add tests validating the new filtering behavior for workflows that output AgentResponse.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
python/packages/core/agent_framework/_workflows/_agent.py Filters AgentResponse.messages to assistant-only in streaming and non-streaming conversions.
python/packages/core/tests/workflow/test_workflow_agent.py Adds regression tests for assistant-only filtering when workflow output is an AgentResponse.

Comment on lines 464 to +469
if isinstance(data, AgentResponse):
messages.extend(data.messages)
# Filter to only assistant messages to avoid re-emitting user input
# that may be included in the AgentResponse's conversation history
# (e.g., from GroupChat orchestrators that include the full conversation).
assistant_messages = [msg for msg in data.messages if msg.role == "assistant"]
messages.extend(assistant_messages)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change filters non-assistant messages only when the workflow output event data is an AgentResponse, but the reported #4261 GroupChat behavior appears to emit list[Message] output (e.g., GroupChat orchestrator yields _full_conversation as list[Message]). In that common path, WorkflowAgent will still surface user-role messages unchanged, so user input can still be re-emitted and compound across turns. Consider either (a) applying equivalent role filtering when converting list[Message] (and possibly Message) outputs that represent full conversation history, or (b) changing GroupChat to yield only new assistant messages (or an AgentResponse containing only assistant messages) when used via workflow.as_agent().

Copilot uses AI. Check for mistakes.
Comment on lines +1310 to +1336
async def test_streaming_agent_response_filters_user_messages(self):
"""Test that streaming filters out user messages from AgentResponse data.

When an executor yields an AgentResponse containing both user and assistant
messages (as GroupChat orchestrators do), only assistant messages should
appear in the streaming output.
"""

@executor
async def groupchat_like_executor(
messages: list[Message], ctx: WorkflowContext[Never, AgentResponse]
) -> None:
# Simulate a GroupChat-like executor that emits full conversation history
response = AgentResponse(
messages=[
Message(role="user", text="hi"),
Message(role="assistant", text="Hello! How can I help?", author_name="Principal"),
Message(role="user", text="what is 2+2?"),
Message(role="assistant", text="2+2 = 4", author_name="Maths Teacher"),
Message(role="assistant", text="The answer is 4.", author_name="Principal"),
],
)
await ctx.yield_output(response)

workflow = WorkflowBuilder(start_executor=groupchat_like_executor).build()
agent = workflow.as_agent("groupchat-agent")

Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests validate filtering for the AgentResponse output branch, but they don’t cover the GroupChat-style list[Message] output path (which is the likely source of #4261). If the goal is to prevent re-emitting user inputs for GroupChat workflows, add a regression test that runs a minimal GroupChat (or an executor that yields list[Message] conversation history) through workflow.as_agent() and asserts user-role messages are not surfaced.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

@LEDazzio01 LEDazzio01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Both comments raise the same valid point about the list[Message] and Message branches not being filtered.

Our rationale for scoping the fix to AgentResponse only:

The list[Message] and Message branches represent explicit executor output — the developer directly controls what they yield via ctx.yield_output() or ctx.send_message(). Silently filtering by role in those branches could break intentional behavior (e.g., an executor that deliberately emits a user-role message as part of its output).

In contrast, AgentResponse is a higher-level wrapper that commonly contains full conversation history (as GroupChat orchestrators do), where user messages are incidental rather than intentional output.

The issue reporter's repro (#4261) uses GroupChatBuilder which emits AgentResponse via AgentExecutor, so the fix directly addresses the reported behavior. If GroupChat also has a code path that yields list[Message] with the full conversation, that's arguably a bug in the GroupChat orchestrator itself (it should yield only new assistant messages), not something WorkflowAgent should silently fix.

Happy to extend the filter if maintainers prefer a more defensive approach — just wanted to call out the tradeoff.

Copy link
Contributor

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 87%

✓ Correctness

The diff correctly addresses the stated issue (#4261) of filtering out non-assistant messages from AgentResponse in both the streaming and non-streaming code paths. The filtering logic is consistent across both paths, edge cases (all-user, all-assistant, mixed) are tested, and metadata (usage_details, raw_representation, created_at) is still properly accumulated regardless of message filtering. The tests are well-structured and cover the key scenarios.

✓ Security Reliability

This diff is a security improvement that prevents user input from being re-emitted in agent responses when GroupChat-style orchestrators include full conversation history. The filtering uses a strict allowlist (role == "assistant"), which is the safer approach. The implementation is consistent across both streaming and non-streaming paths, and the tests cover normal, all-assistant, and all-user edge cases. No security or reliability issues found.

✓ Test Coverage

The new tests cover the primary streaming and non-streaming filtering paths well, including the happy path with mixed messages, a no-regression case with all-assistant messages, and the edge case of all-user messages. However, there are gaps: no test covers messages with roles other than 'user' and 'assistant' (e.g., 'system' or 'tool'), which the filter would silently drop; the non-streaming path lacks edge-case coverage (all-user, empty messages); and the streaming tests don't verify that author_name is correctly propagated through the filtering to AgentResponseUpdate objects, despite the production code explicitly mapping it.

Suggestions

  • The filter excludes all non-assistant roles (not just "user"), which means "system" or "tool" role messages would also be dropped. If AgentResponse can legitimately contain tool-result or system messages that downstream consumers need, this could silently discard them. Consider whether a blocklist (role != "user") would be safer than an allowlist (role == "assistant"), or document that only assistant messages are ever expected to be forwarded.
  • In the non-streaming path, when all messages are filtered out the result is an AgentResponse with an empty messages list but populated usage_details and raw_representation. Consider whether downstream consumers handle empty message lists gracefully, or whether a log/warning would be helpful for debuggability.
  • Consider whether messages with roles other than "user" and "assistant" (e.g., "tool", "system") should be preserved. The current allowlist silently drops any non-assistant role, which could cause subtle data loss if future message types are introduced. If this is intentional, a brief comment noting it would help future maintainers.
  • The edge case where all messages are filtered out (test_streaming_agent_response_empty_after_filtering) still appends to raw_representations and merges usage_details in the non-streaming path. Verify that downstream consumers handle an AgentResponse with an empty messages list but non-empty raw_representation gracefully.
  • Add a test case with a 'system' or 'tool' role message in the AgentResponse to explicitly document whether filtering those out is intended behavior. The current filter drops everything except 'assistant', but only 'user' messages are tested.
  • Add a non-streaming counterpart to test_streaming_agent_response_empty_after_filtering to verify the non-streaming path also handles an AgentResponse with only non-assistant messages gracefully (e.g., resulting in an empty messages list and no crash).
  • In test_streaming_agent_response_filters_user_messages, verify that author_name is correctly propagated to the AgentResponseUpdate objects, since the production code maps msg.author_name to the update. For example: assert updates[0].author_name == 'Principal'

Automated review by moonbox3's agents

Comment on lines +1348 to +1350

# Verify the content is correct
texts = [u.text for u in updates]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider also asserting update.author_name for each update to verify the production code's author_name=msg.author_name mapping is preserved through filtering. Currently only role and text are verified on streaming updates.

Suggested change
# Verify the content is correct
texts = [u.text for u in updates]
# Verify all updates are assistant role
for update in updates:
assert update.role == "assistant", f"Expected role='assistant', got role='{update.role}'"
# Verify author_name is propagated correctly
assert updates[0].author_name == "Principal"
assert updates[1].author_name == "Maths Teacher"
assert updates[2].author_name == "Principal"

updates.append(update)

# No assistant messages means no updates
assert len(updates) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test for non-assistant, non-user roles (e.g., 'system' or 'tool') to explicitly document that the filter drops those as well. This would prevent a future regression if someone expects system messages to pass through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy all feedback - I'll address tonight.

- Add author_name assertions to streaming filter test
- Add test for system/tool role filtering
- Add non-streaming edge case for empty-after-filtering
@LEDazzio01
Copy link
Contributor Author

@moonbox3 Thanks for the detailed review! I've pushed a commit addressing all your feedback:

Changes in dc51ea8:

  1. author_name assertions — Added test_streaming_author_name_preserved_through_filtering that explicitly verifies author_name is propagated through filtering (Principal, Maths Teacher, Principal)

  2. System/tool role test coverage — Added test_streaming_filters_system_and_tool_roles and test_non_streaming_filters_system_and_tool_roles to explicitly document that the allowlist (role == "assistant") intentionally drops system and tool roles. This is by design: system prompts and tool results are internal workflow artifacts, not agent output.

  3. Non-streaming empty edge case — Added test_non_streaming_empty_after_filtering to verify the non-streaming path handles an AgentResponse with only non-assistant messages gracefully (empty messages list, no crash).

These tests are in a new file test_workflow_agent_filtering_addl.py to keep the diff clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: _convert_workflow_event_to_agent_response_updates emits user input again

4 participants