Skip to content

.Net: Improve OpenAIResponseAgent exception handling (rate limit, auth, content filter, model not found)#13011

Open
Yusuftmle wants to merge 10 commits into
microsoft:mainfrom
Yusuftmle:improve-openai-exception-handling
Open

.Net: Improve OpenAIResponseAgent exception handling (rate limit, auth, content filter, model not found)#13011
Yusuftmle wants to merge 10 commits into
microsoft:mainfrom
Yusuftmle:improve-openai-exception-handling

Conversation

@Yusuftmle
Copy link
Copy Markdown

@Yusuftmle Yusuftmle commented Aug 26, 2025

Summary

This PR improves exception handling for Microsoft.SemanticKernel.Agents.OpenAI.OpenAIResponseAgent.

Previously, calls to InvokeAsync(...) and InvokeStreamingAsync(...) could collapse provider failures into a generic NullReferenceException, losing context such as rate limiting, content filter violations, authentication failures, or missing models. This made it difficult to implement robust retry logic and provide meaningful error messages.

A key root cause was that IAsyncEnumerable uses lazy evaluation — the actual network call doesn't happen at the method call site, but inside the await foreach loop. This meant exceptions thrown during enumeration were escaping the original try-catch blocks entirely.

Changes

  • Added HandleProviderExceptionsAsync — a thin async wrapper that intercepts exceptions during both the initialization phase and MoveNextAsync iteration, wrapping them in KernelException with descriptive messages.
  • OperationCanceledException is preserved and never wrapped.
  • Added finally block to guarantee DisposeAsync is called on the enumerator even if the stream fails mid-way.
  • Replaced synthetic unit tests with integration-style tests using a custom ThrowingHttpMessageHandler that simulates real HTTP failures (429, 500) end-to-end against a real OpenAIResponseAgent instance.

Why

Fixes [#12976] by ensuring that provider errors are surfaced with meaningful messages instead of NullReferenceException.

Notes

  • This is my first PR to Semantic Kernel
  • Exception handling approach was updated based on feedback from @markwallace-microsoft to use a broad catch rather than per-status-code mapping.
  • Feedback is welcome — especially regarding exception type design and whether additional metadata should be attached.
  • Unit tests included and passing locally (dotnet test).

Environment Tested

  • .NET 8
  • Microsoft.SemanticKernel 1.61.0-preview
  • Microsoft.SemanticKernel.Agents.OpenAI 1.61.0-preview
  • Azure OpenAI (Responses API)

@Yusuftmle Yusuftmle requested a review from a team as a code owner August 26, 2025 12:34
@moonbox3 moonbox3 added the .NET Issue or Pull requests regarding .NET code label Aug 26, 2025
@github-actions github-actions Bot changed the title .NET: Improve OpenAIResponseAgent exception handling (rate limit, auth, content filter, model not found) .Net: Improve OpenAIResponseAgent exception handling (rate limit, auth, content filter, model not found) Aug 26, 2025
@Yusuftmle
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Comment thread dotnet/src/Agents/OpenAI/OpenAIResponseAgent.cs Outdated
Yusuftmle and others added 2 commits September 10, 2025 18:39
…proofing

- Changed exception handling to catch all Exception types instead of only OpenAI-specific ones
- This approach is more robust and won't miss new exception types from future SDK updates
- Inner exceptions are preserved for detailed error analysis
- Updated unit tests accordingly
@Yusuftmle Yusuftmle force-pushed the improve-openai-exception-handling branch 2 times, most recently from 55698c5 to 67a7b9c Compare September 10, 2025 15:49
@westey-m westey-m self-assigned this Apr 29, 2026
@westey-m westey-m moved this to Community PR in Agent Framework Apr 29, 2026
Copilot AI review requested due to automatic review settings May 11, 2026 11:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve how Microsoft.SemanticKernel.Agents.OpenAI.OpenAIResponseAgent surfaces provider failures by adding additional exception handling and introducing new unit tests for error mapping.

Changes:

  • Added broad try/catch blocks in OpenAIResponseAgent.InvokeAsync(...) and InvokeStreamingAsync(...) that wrap initialization failures in KernelException.
  • Refactored local variable setup in both invoke paths (non-streaming and streaming) to support the new exception handling shape.
  • Added a new unit test class intended to validate exception mapping behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
dotnet/src/Agents/OpenAI/OpenAIResponseAgent.cs Adds exception wrapping around invoke setup and adjusts streaming message notification flow.
dotnet/src/Agents/UnitTests/OpenAI/OpenAIResponseAgentExceptionTests.cs Introduces new tests intended to validate error-to-message mapping for OpenAIResponseAgent.
Comments suppressed due to low confidence (1)

dotnet/src/Agents/OpenAI/OpenAIResponseAgent.cs:90

  • Exceptions thrown while enumerating invokeResults (e.g., from ResponseThreadActions.InvokeAsync during the API call) are not covered by the try/catch, so provider failures can still escape without the intended mapping (and may still surface as the original problematic exception types). If the goal is to surface provider errors consistently, wrap the await foreach enumeration (or use a local iterator that applies try/catch around MoveNextAsync).
        // Yield results with additional error handling
        await foreach (var result in invokeResults.ConfigureAwait(false))
        {
            if (options?.OnIntermediateMessage is not null)
            {
                await options.OnIntermediateMessage(result).ConfigureAwait(false);
            }

            await this.NotifyThreadOfNewMessage(agentThread, result, cancellationToken).ConfigureAwait(false);
            yield return new(result, agentThread);
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +62 to +78
try
{
agentThread = await this.EnsureThreadExistsWithMessagesAsync(messages, thread, cancellationToken).ConfigureAwait(false);
extensionsContextOptions = await this.FinalizeInvokeOptionsAsync(messages, options, agentThread, cancellationToken).ConfigureAwait(false);

ChatHistory chatHistory = [.. messages];
invokeResults = ResponseThreadActions.InvokeAsync(
this,
chatHistory,
agentThread,
extensionsContextOptions,
cancellationToken);
}
catch (Exception ex)
{
throw new KernelException($"OpenAI provider error for agent '{this.Name}': {ex.Message}", ex);
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad catch was suggested by @markwallace-microsoft
in a previous review. Happy to narrow it to
ClientResultException if preferred.
What's the recommended approach?

Comment on lines +75 to +78
catch (Exception ex)
{
throw new KernelException($"OpenAI provider error for agent '{this.Name}': {ex.Message}", ex);
}
Comment on lines +141 to +144
await foreach (var result in invokeResults.ConfigureAwait(false))
{
await NotifyMessagesAsync().ConfigureAwait(false);
yield return new(result, agentThread);
Comment on lines +18 to +38
[Fact]
public void InvokeAsync_ShouldMapRateLimitCorrectly()
{
var ex = new HttpRequestException("HTTP 429 Rate limit exceeded");
KernelException? result = null;

try
{
if (ex.Message.Contains("429"))
throw new KernelException($"Rate limit exceeded for agent '{AgentName}'. Check Retry-After header and implement backoff.", ex);
}
catch (KernelException ke)
{
result = ke;
}

Assert.NotNull(result);
Assert.Contains("Rate limit exceeded", result.Message);
Assert.Contains("Retry-After header", result.Message);
Assert.Equal(ex, result.InnerException);
}
Comment on lines +161 to +166
var thrownException = Assert.ThrowsAsync<InvalidOperationException>(() =>
{
throw ex;
});

Assert.Equal("Unknown streaming exception", thrownException.Result.Message);
Comment on lines +1 to +13
using System;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using Xunit;
using Microsoft.SemanticKernel;

namespace SemanticKernel.Agents.UnitTests.OpenAI
{
/// <summary>
/// Tests for the updated exception handling logic in OpenAIResponseAgent.
/// Verifies that KernelException messages are correct and unknown exceptions propagate.
/// </summary>
Comment on lines +141 to +145
await foreach (var result in invokeResults.ConfigureAwait(false))
{
await NotifyMessagesAsync().ConfigureAwait(false);
yield return new(result, agentThread);
}
@Yusuftmle
Copy link
Copy Markdown
Author

Hi @westey-m, just checking in after some time.
Is there anything needed from my side
to move this forward? Happy to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

.NET Issue or Pull requests regarding .NET code

Projects

Status: Community PR

Development

Successfully merging this pull request may close these issues.

5 participants