Feature Request
Description
Add native async/await support for the OCI Generative AI Inference client to enable non-blocking concurrent requests in async applications.
Problem Statement
The current SDK uses synchronous HTTP requests via the requests library. This causes issues in async applications:
- Event loop blocking: Sync calls block the event loop in FastAPI, async agents, and other async frameworks
- Limited concurrency: Cannot efficiently make concurrent API calls
- Performance bottleneck: Sequential requests are significantly slower than concurrent alternatives
Proposed Solution
Add an AsyncGenerativeAiInferenceClient class that:
- Uses
aiohttp for true async HTTP requests
- Reuses the existing OCI
Signer for authentication
- Provides async versions of all GenAI operations (chat, streaming, embeddings, etc.)
- Supports async context manager pattern
Example Usage
import asyncio
from oci.generative_ai_inference import AsyncGenerativeAiInferenceClient
async def main():
async with AsyncGenerativeAiInferenceClient(config) as client:
# Concurrent requests - 3x faster than sequential
results = await asyncio.gather(
client.chat(details1),
client.chat(details2),
client.chat(details3),
)
asyncio.run(main())
Performance Impact
Testing shows 2-3.5x throughput improvement for concurrent workloads:
| Scenario |
Sequential |
Concurrent |
Speedup |
| 3 requests (Llama 3.3) |
1.30s |
0.64s |
2.01x |
| 3 requests (Llama 3.2) |
1.40s |
0.44s |
3.18x |
| 3 requests (Cohere) |
0.50s |
0.14s |
3.54x |
Use Cases
- FastAPI/async web frameworks: Non-blocking GenAI calls in async endpoints
- LangChain agents: Concurrent tool calls and chain execution
- Batch processing: Parallel processing of multiple prompts
- Real-time applications: Low-latency streaming responses
Implementation
A reference implementation is provided in PR #835 with:
- Full async client implementation
- 15 unit tests
- 7 integration tests
- Tested on Python 3.9, 3.12, 3.13, 3.14
- Tested with 6 different models
Related
Feature Request
Description
Add native async/await support for the OCI Generative AI Inference client to enable non-blocking concurrent requests in async applications.
Problem Statement
The current SDK uses synchronous HTTP requests via the
requestslibrary. This causes issues in async applications:Proposed Solution
Add an
AsyncGenerativeAiInferenceClientclass that:aiohttpfor true async HTTP requestsSignerfor authenticationExample Usage
Performance Impact
Testing shows 2-3.5x throughput improvement for concurrent workloads:
Use Cases
Implementation
A reference implementation is provided in PR #835 with:
Related