Skip to content

feat: add /openai/v1 passthrough to bedrock-mantle#62

Merged
xiehust merged 22 commits into
mainfrom
feat/openai-passthrough
May 25, 2026
Merged

feat: add /openai/v1 passthrough to bedrock-mantle#62
xiehust merged 22 commits into
mainfrom
feat/openai-passthrough

Conversation

@xiehust
Copy link
Copy Markdown
Contributor

@xiehust xiehust commented May 25, 2026

Summary

  • Adds new /openai/v1/* endpoints that accept OpenAI-native API requests (Chat Completions, Responses API + full CRUD, /models) and forward them to AWS bedrock-mantle, gated by ENABLE_OPENAI_PASSTHROUGH=False.
  • Reuses the proxy's existing API key auth (now also accepts Authorization: Bearer), rate limits, budgets, and usage tracking. Usage is normalized into the existing DynamoDB schema with two new sparse columns (api_surface, reasoning_tokens).
  • Independent of ENABLE_OPENAI_COMPAT — both flags can be enabled together. The new endpoints are pure raw-httpx passthrough (no Pydantic schemas for OpenAI types) for forward compatibility with new Mantle features.

Endpoints:

Method Path
POST /openai/v1/chat/completions (streaming + non-streaming)
POST /openai/v1/responses (streaming + non-streaming)
GET / DELETE /openai/v1/responses/{id}
POST /openai/v1/responses/{id}/cancel
GET /openai/v1/responses/{id}/input_items
GET /openai/v1/models

Documentation: see docs/plans/2026-05-25-openai-passthrough-design.md for the design rationale and docs/architecture/features.md for the user-facing feature doc.

Test plan

  • 127/127 tests pass (uv run pytest tests/unit tests/integration/test_openai_passthrough)
  • Routes registered when ENABLE_OPENAI_PASSTHROUGH=True; not present when False (regression test included)
  • Authorization: Bearer and x-api-key both accepted
  • Streaming chat completions: usage extracted from data: {... "usage": ...} chunk when client sends stream_options: {"include_usage": true}
  • Streaming responses: usage extracted from response.completed SSE event
  • Upstream timeout during streaming yields structured data: {"error": ...}\n\n[DONE] instead of crashing the stream
  • CRUD endpoints (GET/DELETE/cancel/input_items) are pure passthrough with no usage logging
  • Upstream 4xx returned verbatim with no usage logged
  • Bedrock guardrail headers (X-Amzn-Bedrock-*) forwarded to upstream
  • Model mapping table consulted with passthrough fallback
  • Lint (ruff) and type (mypy) clean on the new module

Manual verification recommended before deploy:

  • End-to-end with real bedrock-mantle credentials (OPENAI_API_KEY, OPENAI_BASE_URL) using OpenAI Python SDK
  • Multi-turn conversation chaining via previous_response_id
  • Verify usage rows in DynamoDB include api_surface and reasoning_tokens columns

Follow-ups (out of scope)

  • OTEL tracing on the new endpoints (deferred per design doc)
  • Admin portal api_surface filter (deferred)
  • Refactor router to inject DDB managers via FastAPI `Depends` instead of module-globals + `importlib.reload` in tests

xiehust and others added 22 commits May 25, 2026 05:16
16 tasks covering feature flag, auth middleware extension, usage extraction,
httpx passthrough client, /chat/completions and /responses endpoints with
streaming, full Responses CRUD, /models, guardrail header forwarding, and
documentation updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mounts /openai/v1/* (chat/completions, responses + CRUD, models) as raw
httpx passthrough to bedrock-mantle. Reuses proxy API key auth, rate
limits, budgets, and usage tracking. Independent of ENABLE_OPENAI_COMPAT.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the FastAPI router for OpenAI passthrough, mounts it
conditionally under /openai/v1 when ENABLE_OPENAI_PASSTHROUGH=True,
and adds four integration tests (non-streaming forward, model mapping,
4xx passthrough, and 401 on missing auth).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ut; add flag-off and timeout tests

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… base path

httpx follows RFC 3986 path-merging on AsyncClient.base_url: a request path
starting with `/` REPLACES the base_url's path entirely. With
OPENAI_BASE_URL=https://bedrock-mantle.us-west-2.api.aws/v1, calls like
`client.post("/chat/completions")` were being sent to
`bedrock-mantle.us-west-2.api.aws/chat/completions` (no `/v1`), causing
404s in production.

Fix:
- Drop base_url from the AsyncClient
- Add upstream_url(path) that explicitly joins OPENAI_BASE_URL + path
- Use upstream_url() everywhere we previously passed bare paths
- Add unit tests covering leading-slash, trailing-slash, and ID-in-path
  cases that would have caught this

Integration tests passed previously because respx joins base_url + path
intuitively; only real httpx exhibits the RFC 3986 replacement behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…inition

The previous deploy mounted the new /openai/v1/* code but the CDK never
passed ENABLE_OPENAI_PASSTHROUGH through to the container, so the
conditional router mount at app/main.py evaluated False and the routes
weren't registered. Add support symmetrical to enableOpenaiCompat:

- AppConfig: new enableOpenaiPassthrough field
- prod default: true (ship the feature on by default)
- dev default: false (avoid accidental routing changes in dev)
- env-var override: ENABLE_OPENAI_PASSTHROUGH at deploy time
- ECS task env: emit ENABLE_OPENAI_PASSTHROUGH=<value>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bedrock-mantle emits Responses API SSE as data-only frames (the event type
is embedded as a JSON field but no `event: <type>` line is present). This
matches the SSE spec but diverges from real OpenAI servers, which prepend
each frame with `event: <type>`. Strict clients like OpenAI Codex CLI key
off the `event:` field and report "stream closed before response.completed"
when they don't see it.

Synthesize `event: <type>` lines from each data frame's JSON `type` field
when api_surface == "responses". Chat Completions streams remain unchanged
(real OpenAI doesn't use event: lines for that endpoint).

Tests:
- test_streaming_responses_synthesizes_event_lines_for_data_only_upstream
  asserts every data: frame is preceded by the matching event: line
- test_streaming_chat_completions_does_not_inject_event_lines
  pins the no-injection contract for chat completions

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s codes

Previously, when client requested stream=true and upstream returned a
non-2xx status (e.g. validation 400) or a connection error, the proxy
would still return 200 text/event-stream and dump the JSON error body
(or a synthetic SSE frame) into the stream. Strict SSE clients like
OpenAI Codex CLI then hang waiting for response.completed and report
"stream closed before response.completed" — masking the real error.

Refactor: split open_upstream_stream() (peeks at status) from
stream_passthrough_response() (streams an open 2xx body). The router
now:

- Returns the real upstream status as JSONResponse when the upstream
  responds with 4xx/5xx for a streaming request.
- Returns 502/504 JSON when the upstream is unreachable
  (TimeoutException / RequestError) before any bytes flow.
- Continues to emit an SSE error+[DONE] frame only for failures that
  occur AFTER the 2xx stream has begun (where we cannot retroactively
  change the HTTP status).

Tests:
- test_streaming_responses_upstream_4xx_returns_json_not_sse
- test_streaming_upstream_timeout_returns_json_504 (replaces the prior
  test that asserted the buggy SSE-error behavior)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xiehust xiehust merged commit a4f094b into main May 25, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant