Python: fix: filter non-assistant messages from AgentResponse in WorkflowAgent by LEDazzio01 · Pull Request #4275 · microsoft/agent-framework

LEDazzio01 · 2026-02-25T21:32:52Z

Summary

When a workflow (e.g., GroupChat) emits an AgentResponse containing the full conversation history (user + assistant messages), WorkflowAgent was converting all messages to output — including user messages. This caused user inputs to be re-emitted as part of the assistant response, with the problem compounding across successive turns as the conversation history grew.

Root Cause

In _workflows/_agent.py, both the streaming path (_convert_workflow_event_to_agent_response_updates) and the non-streaming path (_convert_workflow_events_to_agent_response) iterated over all messages in AgentResponse.messages without filtering by role. When GroupChat-style orchestrators emit their full conversation history as an AgentResponse, this meant user messages were surfaced as assistant output.

Fix

Filter AgentResponse.messages to only include messages with role="assistant" before converting them to output:

Streaming path (_convert_workflow_event_to_agent_response_updates):

for msg in data.messages:
    if msg.role != "assistant":
        continue
    updates.append(...)

Non-streaming path (_convert_workflow_events_to_agent_response):

if isinstance(data, AgentResponse):
    assistant_messages = [msg for msg in data.messages if msg.role == "assistant"]
    messages.extend(assistant_messages)

Tests Added

Added TestWorkflowAgentUserInputFiltering class with 4 test cases:

Streaming: Verifies only assistant messages from a mixed-role AgentResponse appear in streaming output
Non-streaming: Same verification for non-streaming path
All-assistant (regression): Verifies no change in behavior when AgentResponse only contains assistant messages
Empty after filtering: Verifies correct behavior when AgentResponse contains only user messages

Scope

This fix targets only the AgentResponse branches. The Message and list[Message] branches are intentionally left unchanged, as those represent explicit workflow output where the executor has control over what it yields (and existing tests validate this behavior).

When a workflow (e.g., GroupChat) emits an AgentResponse containing the full conversation history (user + assistant messages), WorkflowAgent was converting ALL messages to output, causing user inputs to be re-emitted as assistant responses. This was visible as accumulated user inputs in each successive response. Fix: filter AgentResponse.messages to only include role="assistant" messages in both _convert_workflow_event_to_agent_response_updates (streaming) and _convert_workflow_events_to_agent_response (non-streaming). Fixes microsoft#4261

…4261) Add TestWorkflowAgentUserInputFiltering class with tests verifying that: - Streaming filters user messages from AgentResponse data - Non-streaming filters user messages from AgentResponse data - All-assistant AgentResponse works unchanged (no regression) - All-user AgentResponse produces no updates

Copilot

Pull request overview

This PR aims to prevent WorkflowAgent from re-emitting user inputs when a workflow outputs an AgentResponse that contains full conversation history, by filtering AgentResponse.messages to role="assistant" before converting to streamed updates or non-streamed results.

Changes:

Filter non-assistant messages out of AgentResponse.messages in the non-streaming conversion path.
Filter non-assistant messages out of AgentResponse.messages in the streaming conversion path.
Add tests validating the new filtering behavior for workflows that output AgentResponse.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
python/packages/core/agent_framework/_workflows/_agent.py	Filters `AgentResponse.messages` to assistant-only in streaming and non-streaming conversions.
python/packages/core/tests/workflow/test_workflow_agent.py	Adds regression tests for assistant-only filtering when workflow output is an `AgentResponse`.

Copilot · 2026-02-25T21:38:08Z

python/packages/core/agent_framework/_workflows/_agent.py

                if isinstance(data, AgentResponse):
-                    messages.extend(data.messages)
+                    # Filter to only assistant messages to avoid re-emitting user input
+                    # that may be included in the AgentResponse's conversation history
+                    # (e.g., from GroupChat orchestrators that include the full conversation).
+                    assistant_messages = [msg for msg in data.messages if msg.role == "assistant"]
+                    messages.extend(assistant_messages)


This change filters non-assistant messages only when the workflow output event data is an AgentResponse, but the reported #4261 GroupChat behavior appears to emit list[Message] output (e.g., GroupChat orchestrator yields _full_conversation as list[Message]). In that common path, WorkflowAgent will still surface user-role messages unchanged, so user input can still be re-emitted and compound across turns. Consider either (a) applying equivalent role filtering when converting list[Message] (and possibly Message) outputs that represent full conversation history, or (b) changing GroupChat to yield only new assistant messages (or an AgentResponse containing only assistant messages) when used via workflow.as_agent().

Copilot · 2026-02-25T21:38:09Z

python/packages/core/tests/workflow/test_workflow_agent.py

+    async def test_streaming_agent_response_filters_user_messages(self):
+        """Test that streaming filters out user messages from AgentResponse data.
+
+        When an executor yields an AgentResponse containing both user and assistant
+        messages (as GroupChat orchestrators do), only assistant messages should
+        appear in the streaming output.
+        """
+
+        @executor
+        async def groupchat_like_executor(
+            messages: list[Message], ctx: WorkflowContext[Never, AgentResponse]
+        ) -> None:
+            # Simulate a GroupChat-like executor that emits full conversation history
+            response = AgentResponse(
+                messages=[
+                    Message(role="user", text="hi"),
+                    Message(role="assistant", text="Hello! How can I help?", author_name="Principal"),
+                    Message(role="user", text="what is 2+2?"),
+                    Message(role="assistant", text="2+2 = 4", author_name="Maths Teacher"),
+                    Message(role="assistant", text="The answer is 4.", author_name="Principal"),
+                ],
+            )
+            await ctx.yield_output(response)
+
+        workflow = WorkflowBuilder(start_executor=groupchat_like_executor).build()
+        agent = workflow.as_agent("groupchat-agent")
+


These tests validate filtering for the AgentResponse output branch, but they don’t cover the GroupChat-style list[Message] output path (which is the likely source of #4261). If the goal is to prevent re-emitting user inputs for GroupChat workflows, add a regression test that runs a minimal GroupChat (or an executor that yields list[Message] conversation history) through workflow.as_agent() and asserts user-role messages are not surfaced.

LEDazzio01

Thanks for the review! Both comments raise the same valid point about the list[Message] and Message branches not being filtered.

Our rationale for scoping the fix to AgentResponse only:

The list[Message] and Message branches represent explicit executor output — the developer directly controls what they yield via ctx.yield_output() or ctx.send_message(). Silently filtering by role in those branches could break intentional behavior (e.g., an executor that deliberately emits a user-role message as part of its output).

In contrast, AgentResponse is a higher-level wrapper that commonly contains full conversation history (as GroupChat orchestrators do), where user messages are incidental rather than intentional output.

The issue reporter's repro (#4261) uses GroupChatBuilder which emits AgentResponse via AgentExecutor, so the fix directly addresses the reported behavior. If GroupChat also has a code path that yields list[Message] with the full conversation, that's arguably a bug in the GroupChat orchestrator itself (it should yield only new assistant messages), not something WorkflowAgent should silently fix.

Happy to extend the filter if maintainers prefer a more defensive approach — just wanted to call out the tradeoff.

moonbox3

Automated Code Review

Reviewers: 3 | Confidence: 87%

✓ Correctness

The diff correctly addresses the stated issue (#4261) of filtering out non-assistant messages from AgentResponse in both the streaming and non-streaming code paths. The filtering logic is consistent across both paths, edge cases (all-user, all-assistant, mixed) are tested, and metadata (usage_details, raw_representation, created_at) is still properly accumulated regardless of message filtering. The tests are well-structured and cover the key scenarios.

✓ Security Reliability

This diff is a security improvement that prevents user input from being re-emitted in agent responses when GroupChat-style orchestrators include full conversation history. The filtering uses a strict allowlist (role == "assistant"), which is the safer approach. The implementation is consistent across both streaming and non-streaming paths, and the tests cover normal, all-assistant, and all-user edge cases. No security or reliability issues found.

✓ Test Coverage

The new tests cover the primary streaming and non-streaming filtering paths well, including the happy path with mixed messages, a no-regression case with all-assistant messages, and the edge case of all-user messages. However, there are gaps: no test covers messages with roles other than 'user' and 'assistant' (e.g., 'system' or 'tool'), which the filter would silently drop; the non-streaming path lacks edge-case coverage (all-user, empty messages); and the streaming tests don't verify that author_name is correctly propagated through the filtering to AgentResponseUpdate objects, despite the production code explicitly mapping it.

Suggestions

The filter excludes all non-assistant roles (not just "user"), which means "system" or "tool" role messages would also be dropped. If AgentResponse can legitimately contain tool-result or system messages that downstream consumers need, this could silently discard them. Consider whether a blocklist (role != "user") would be safer than an allowlist (role == "assistant"), or document that only assistant messages are ever expected to be forwarded.
In the non-streaming path, when all messages are filtered out the result is an AgentResponse with an empty messages list but populated usage_details and raw_representation. Consider whether downstream consumers handle empty message lists gracefully, or whether a log/warning would be helpful for debuggability.
Consider whether messages with roles other than "user" and "assistant" (e.g., "tool", "system") should be preserved. The current allowlist silently drops any non-assistant role, which could cause subtle data loss if future message types are introduced. If this is intentional, a brief comment noting it would help future maintainers.
The edge case where all messages are filtered out (test_streaming_agent_response_empty_after_filtering) still appends to raw_representations and merges usage_details in the non-streaming path. Verify that downstream consumers handle an AgentResponse with an empty messages list but non-empty raw_representation gracefully.
Add a test case with a 'system' or 'tool' role message in the AgentResponse to explicitly document whether filtering those out is intended behavior. The current filter drops everything except 'assistant', but only 'user' messages are tested.
Add a non-streaming counterpart to test_streaming_agent_response_empty_after_filtering to verify the non-streaming path also handles an AgentResponse with only non-assistant messages gracefully (e.g., resulting in an empty messages list and no crash).
In test_streaming_agent_response_filters_user_messages, verify that author_name is correctly propagated to the AgentResponseUpdate objects, since the production code maps msg.author_name to the update. For example: assert updates[0].author_name == 'Principal'

Automated review by moonbox3's agents

moonbox3 · 2026-02-26T12:21:13Z

python/packages/core/tests/workflow/test_workflow_agent.py

+
+        # Verify the content is correct
+        texts = [u.text for u in updates]


Consider also asserting update.author_name for each update to verify the production code's author_name=msg.author_name mapping is preserved through filtering. Currently only role and text are verified on streaming updates.

Suggested change

# Verify the content is correct

texts = [u.text for u in updates]

# Verify all updates are assistant role

for update in updates:

assert update.role == "assistant", f"Expected role='assistant', got role='{update.role}'"

# Verify author_name is propagated correctly

assert updates[0].author_name == "Principal"

assert updates[1].author_name == "Maths Teacher"

assert updates[2].author_name == "Principal"

moonbox3 · 2026-02-26T12:21:14Z

python/packages/core/tests/workflow/test_workflow_agent.py

+            updates.append(update)
+
+        # No assistant messages means no updates
+        assert len(updates) == 0


Consider adding a test for non-assistant, non-user roles (e.g., 'system' or 'tool') to explicitly document that the filter drops those as well. This would prevent a future regression if someone expects system messages to pass through.

Copy all feedback - I'll address tonight.

- Add author_name assertions to streaming filter test - Add test for system/tool role filtering - Add non-streaming edge case for empty-after-filtering

LEDazzio01 · 2026-02-26T22:15:28Z

@moonbox3 Thanks for the detailed review! I've pushed a commit addressing all your feedback:

Changes in dc51ea8:

✅ author_name assertions — Added test_streaming_author_name_preserved_through_filtering that explicitly verifies author_name is propagated through filtering (Principal, Maths Teacher, Principal)
✅ System/tool role test coverage — Added test_streaming_filters_system_and_tool_roles and test_non_streaming_filters_system_and_tool_roles to explicitly document that the allowlist (role == "assistant") intentionally drops system and tool roles. This is by design: system prompts and tool results are internal workflow artifacts, not agent output.
✅ Non-streaming empty edge case — Added test_non_streaming_empty_after_filtering to verify the non-streaming path handles an AgentResponse with only non-assistant messages gracefully (empty messages list, no crash).

These tests are in a new file test_workflow_agent_filtering_addl.py to keep the diff clean.

LEDazzio01 added 2 commits February 25, 2026 16:28

Copilot AI review requested due to automatic review settings February 25, 2026 21:32

markwallace-microsoft added the python label Feb 25, 2026

github-actions bot changed the title ~~fix: filter non-assistant messages from AgentResponse in WorkflowAgent~~ Python: fix: filter non-assistant messages from AgentResponse in WorkflowAgent Feb 25, 2026

Copilot started reviewing on behalf of LEDazzio01 February 25, 2026 21:33 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

LEDazzio01 commented Feb 25, 2026

View reviewed changes

This was referenced Feb 26, 2026

Python: Implement message filtering in agent workflow to stop emitting already seen responses #4268

Open

Python: [Bug]: _convert_workflow_event_to_agent_response_updates emits user input again #4261

Open

moonbox3 reviewed Feb 26, 2026

View reviewed changes

test: address moonbox3 review feedback on PR microsoft#4275

dc51ea8

- Add author_name assertions to streaming filter test - Add test for system/tool role filtering - Add non-streaming edge case for empty-after-filtering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: fix: filter non-assistant messages from AgentResponse in WorkflowAgent#4275

Python: fix: filter non-assistant messages from AgentResponse in WorkflowAgent#4275
LEDazzio01 wants to merge 3 commits intomicrosoft:mainfrom
LEDazzio01:fix/4261-workflow-agent-user-input-reemission

LEDazzio01 commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

LEDazzio01 left a comment

Uh oh!

moonbox3 left a comment

Uh oh!

moonbox3 Feb 26, 2026

Uh oh!

moonbox3 Feb 26, 2026

Uh oh!

LEDazzio01 Feb 26, 2026

Uh oh!

LEDazzio01 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		# Verify the content is correct
		texts = [u.text for u in updates]

-        # Verify the content is correct
-        texts = [u.text for u in updates]
+        # Verify all updates are assistant role
+        for update in updates:
+            assert update.role == "assistant", f"Expected role='assistant', got role='{update.role}'"
+        # Verify author_name is propagated correctly
+        assert updates[0].author_name == "Principal"
+        assert updates[1].author_name == "Maths Teacher"
+        assert updates[2].author_name == "Principal"

Conversation

LEDazzio01 commented Feb 25, 2026

Summary

Root Cause

Fix

Tests Added

Scope

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

LEDazzio01 left a comment

Choose a reason for hiding this comment

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

Suggestions

Uh oh!

moonbox3 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

moonbox3 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

LEDazzio01 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

LEDazzio01 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants