fix(python): add default 60s timeout to all AI provider clients#13696
fix(python): add default 60s timeout to all AI provider clients#13696badhra-ajaz wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Adds timeout=60.0 to AsyncOpenAI, AsyncAzureOpenAI, and AsyncAnthropic client instantiations across OpenAI, Azure, NVIDIA, and Anthropic connectors to prevent indefinite hangs on API calls. Changes (5 files, 5 client instances): - OpenAI config base: AsyncOpenAI client - Azure config base: AsyncAzureOpenAI client - Anthropic chat completion: AsyncAnthropic client - NVIDIA chat completion: AsyncOpenAI client - NVIDIA text embedding: AsyncOpenAI client 🤖 Powered by [PeakInfer](https://peakinfer.com) LLM inference optimization
|
@badhra-ajaz please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 90%
✓ Correctness
This PR adds a hardcoded 60-second timeout to all internally-created AI client instances (OpenAI, Azure OpenAI, Anthropic, NVIDIA). The changes are mechanically correct — each timeout is added only when the library constructs its own client (i.e., when no user-provided client is passed), and the variable scoping in azure_config_base.py is fine since
argsis reassigned on the next line after client creation. However, the SDK defaults for both OpenAI and Anthropic clients are 600 seconds, so this is a 10x reduction that could break users with long-running completions. The timeout is also not user-configurable without providing a pre-built client, which is a usability concern but not a correctness bug.
✓ Security Reliability
This PR adds explicit
timeout=60.0to all internally-constructed AI clients (Anthropic, NVIDIA, OpenAI, Azure OpenAI). This is a reliability improvement that prevents indefinite hangs when upstream services are unresponsive. The 60-second value is reasonable for LLM calls. The timeout is only applied when the SDK creates the client internally (not when a user passes their own client), which is the correct pattern. No security or reliability issues found.
✗ Test Coverage
This PR adds a hardcoded
timeout=60.0to 5 AI client constructors (Anthropic, NVIDIA chat/embedding, OpenAI, Azure OpenAI), reducing the SDK default from 600s to 60s. However, none of the existing initialization tests are updated to verify the timeout is correctly set on the constructed client. The magic number 60.0 is duplicated across all 5 files with no shared constant, and there is no way for users to configure the timeout without providing their own pre-built client.
✗ Design Approach
The PR hardcodes a 60-second timeout at the HTTP client construction level across all five service constructors. This is a symptom-level fix (presumably addressing hanging requests) that introduces a non-configurable magic constant. Users with legitimate long-running workloads (large context windows, streaming completions, batch embeddings) cannot increase this limit without providing a pre-built client instance. Additionally, the NVIDIA and OpenAI embedding services already expose a per-request
timeoutfield in theirPromptExecutionSettings, so the correct pattern already exists in the codebase — per-request timeout in execution settings should be the primary mechanism. A client-level cap of 60s would silently shadow any per-request timeout exceeding 60s for those execution paths that do carry a timeout in settings, making the behavior surprising. The right approach is to exposetimeoutas an optional constructor parameter (defaulting toNoneto preserve SDK defaults) rather than hardcoding it.
Flagged Issues
- The 60s timeout is hardcoded and not user-configurable. Users with long-running workloads (large context windows, streaming completions, batch embeddings) have no way to increase it without supplying their own pre-built client. The constructor should accept an optional
timeoutparameter (defaultNone) passed through to the underlying SDK client, preserving the SDK's own default when unset. - The NVIDIA services already expose
timeoutinNvidiaChatPromptExecutionSettingsandNvidiaEmbeddingPromptExecutionSettings. A hardcoded 60s client-level cap creates an invisible ceiling that will silently truncate any per-request timeout greater than 60s, making the two timeout mechanisms inconsistent and confusing. - No tests verify the new timeout=60.0 behavior. Each service has init tests that assert on model_id, client type, and headers, but none assert that
client.timeoutis set to 60.0. Since this is a behavioral change (reducing timeout from 600s to 60s), it should have corresponding test coverage to prevent regressions.
Suggestions
- Extract the magic number 60.0 into a shared module-level constant (e.g.,
DEFAULT_CLIENT_TIMEOUT = 60.0) to avoid duplication across 5 files and improve maintainability. - Document this behavioral change — the 60s timeout is a significant reduction from the SDK default of 600s, and users may not realize their previously-working long completions will now fail with timeout errors.
- Consider aligning with the existing per-request timeout pattern already present in OpenAI/NVIDIA execution settings and extending it to Anthropic, rather than adding a new implicit client-level layer.
- Add assertions to existing init tests to verify the timeout is correctly propagated to the client (e.g.,
assert client.timeout == httpx.Timeout(60.0)).
Automated review by badhra-ajaz's agents
| api_key=api_key, | ||
| organization=org_id, | ||
| default_headers=merged_headers, | ||
| timeout=60.0, |
There was a problem hiding this comment.
Hardcoded 60s with no way for the caller to override. Should be an optional constructor parameter passed through to AsyncOpenAI. Also missing test coverage — test_init in test_openai_chat_completion.py does not assert client.timeout.
| timeout=60.0, | |
| client = AsyncOpenAI( | |
| api_key=api_key, | |
| organization=org_id, | |
| default_headers=merged_headers, | |
| ) |
| if "websocket_base_url" in kwargs: | ||
| args["websocket_base_url"] = kwargs.pop("websocket_base_url") | ||
|
|
||
| args["timeout"] = 60.0 |
There was a problem hiding this comment.
Hardcoded 60s is not surfaced as a constructor parameter. Azure OpenAI deployments used for long-context or batch workloads routinely exceed 60s. This should be an __init__ parameter or removed to rely on the SDK default. Also missing test coverage — test_init in test_azure_chat_completion.py does not assert the timeout.
| args["timeout"] = 60.0 | |
| client = AsyncAzureOpenAI(**args) |
| if not async_client: | ||
| async_client = AsyncAnthropic( | ||
| api_key=anthropic_settings.api_key.get_secret_value(), | ||
| timeout=60.0, |
There was a problem hiding this comment.
Hardcoded 60s timeout is not configurable and will silently break long-running Anthropic requests. Expose as an __init__ parameter or omit to rely on the SDK default. Also no test exercises this code path — existing tests use a pre-mocked client.
| timeout=60.0, | |
| async_client = AsyncAnthropic( | |
| api_key=anthropic_settings.api_key.get_secret_value(), | |
| ) |
| client = AsyncOpenAI( | ||
| api_key=nvidia_settings.api_key.get_secret_value() if nvidia_settings.api_key else None, | ||
| base_url=nvidia_settings.base_url, | ||
| timeout=60.0, |
There was a problem hiding this comment.
NvidiaChatPromptExecutionSettings already has a per-request timeout field. This hardcoded 60s client-level cap creates an invisible ceiling that silently overrides caller-specified per-request timeouts. Also missing test coverage in test_init_with_defaults.
| timeout=60.0, | |
| client = AsyncOpenAI( | |
| api_key=nvidia_settings.api_key.get_secret_value() if nvidia_settings.api_key else None, | |
| base_url=nvidia_settings.base_url, | |
| ) |
| client = AsyncOpenAI( | ||
| api_key=nvidia_settings.api_key.get_secret_value() if nvidia_settings.api_key else None, | ||
| base_url=nvidia_settings.base_url, | ||
| timeout=60.0, |
There was a problem hiding this comment.
Same issue: NvidiaEmbeddingPromptExecutionSettings exposes per-request timeout, so hardcoding 60s at the client level creates a hidden cap. Also missing timeout assertion in test_init.
| timeout=60.0, | |
| client = AsyncOpenAI( | |
| api_key=nvidia_settings.api_key.get_secret_value() if nvidia_settings.api_key else None, | |
| base_url=nvidia_settings.base_url, | |
| ) |
Summary
Adds default 60-second timeout to all 5 AI provider client instantiations across OpenAI, Azure OpenAI, Anthropic, and NVIDIA connectors to prevent indefinite hangs on API calls.
Problem
Currently, Semantic Kernel creates AI provider clients without timeout parameters. This means API calls can hang indefinitely on network issues or server unresponsiveness, causing:
Solution
Added
timeout=60.0to all AI provider client instantiations:open_ai_config_base.pyAsyncOpenAIazure_config_base.pyAsyncAzureOpenAIanthropic_chat_completion.pyAsyncAnthropicnvidia_chat_completion.pyAsyncOpenAInvidia_text_embedding.pyAsyncOpenAIPeakInfer Analysis
Category: Reliability + Latency
Issue: Missing default timeout on LLM API clients
Impact: Prevents indefinite hangs across all Semantic Kernel AI connectors
Testing
🤖 Powered by PeakInfer LLM inference optimization