Add audio, moderations, and tokenize/detokenize endpoint support by ericcurtin · Pull Request #816 · docker/model-runner

ericcurtin · 2026-03-31T11:30:36Z

Register previously missing OpenAI-compatible routes so they are no
longer rejected with 404:
/v1/audio/transcriptions, /v1/audio/translations (multipart/form-data)
/v1/audio/speech (JSON)
/v1/moderations
/tokenize and /detokenize (vLLM extension)

Add BackendModeAudio and handleAudioInference which extracts the model
field from multipart form data. vLLM passes audio requests through
natively; llama.cpp returns a descriptive error directing users to the
chat completions input_audio content-part instead. Address review
feedback: extract shared scheduleInference helper to restore parity
(auto-install progress, preload-only, recorder, origin tracking) between
handleOpenAIInference and handleAudioInference; fix multipart temp-file
leak (defer MultipartForm.RemoveAll); tighten /v1/audio/ path matching
to explicit HasSuffix checks; make Content-Type check case-insensitive.

Replace the Go routing layer with a Rust reverse proxy (axum + tokio)
compiled as a CGo-linked staticlib (router/). The Rust router owns:

All route registration and path aliasing (/v1/ -> /engines/, etc.)
CORS middleware matching the Go CorsMiddleware semantics
Path normalisation (NormalizePathLayer)
Static routes: GET / and GET /version

The deleted Go files (pkg/routing/router.go, pkg/routing/routing.go,
pkg/middleware/alias.go) are fully replaced. pkg/routing/service.go is
trimmed; main.go registers Ollama, Anthropic, and Responses handlers
directly on the backend mux.

The in-process CGo callback path (pkg/router/handler.go) replaces the
network proxy hop: Rust calls Go's http.Handler directly via a streaming
protocol — dmr_write_chunk()/dmr_close_stream() push response chunks
into a tokio::sync::mpsc channel as Go writes them, so streaming
endpoints like POST /models/create deliver progress in real time without
buffering the full response.

Rust shared utilities are extracted into a dmr-common workspace crate
(init_tracing, unix_now_secs). model-cli Rust code is deduplicated:
shared send_and_check free function, request_timeout helper,
apply_azure_version helper, build_app/run_gateway_async extracted to
handlers.rs.

Build system:

Cargo workspace root (Cargo.toml) unifies all Rust crates
make build-router-lib compiles the Rust staticlib before go build
Dockerfile installs Rust and builds libdmr_router.a in the builder
stage
CI test job installs Rust toolchain and builds the library so
go test -race works with CGo enabled
Platform-split CGo LDFLAGS: Darwin keeps -framework flags, Linux
uses plain -lpthread/-ldl/-lm
pkg/router/router_stub.go provides a no-op implementation for
CGO_ENABLED=0 builds (lint, cross-compilation)

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The new handleAudioInference logic duplicates much of the backend selection / model lookup / installer wait / runner load flow from handleOpenAIInference; consider extracting a shared helper to reduce drift risk if that logic changes in the future.
backendModeForRequest uses strings.Contains(path, "/v1/audio/") while other modes use suffix checks; tightening this to a suffix or more precise match (e.g., specific audio endpoints) would reduce the chance of misclassifying unrelated paths that happen to contain that segment.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new handleAudioInference logic duplicates much of the backend selection / model lookup / installer wait / runner load flow from handleOpenAIInference; consider extracting a shared helper to reduce drift risk if that logic changes in the future.
- backendModeForRequest uses strings.Contains(path, "/v1/audio/") while other modes use suffix checks; tightening this to a suffix or more precise match (e.g., specific audio endpoints) would reduce the chance of misclassifying unrelated paths that happen to contain that segment.

## Individual Comments

### Comment 1
<location path="pkg/inference/scheduling/http_handler.go" line_range="355-356" />
<code_context>
+	var modelName string
+	var upstreamBody []byte
+
+	contentType := r.Header.Get("Content-Type")
+	if strings.HasPrefix(contentType, "multipart/form-data") {
+		// Read the entire body for buffering before proxying.
+		body, ok := readRequestBody(w, r, maximumAudioInferenceRequestSize)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Make Content-Type handling robust to case differences

HTTP media types are case-insensitive, but this check depends on the exact casing of "multipart/form-data". Normalize before checking, for example:

```go
contentType := strings.ToLower(r.Header.Get("Content-Type"))
if strings.HasPrefix(contentType, "multipart/form-data") {
    ...
}
```
so that variants like "Multipart/Form-Data; boundary=..." are still handled correctly.

```suggestion
	contentType := strings.ToLower(r.Header.Get("Content-Type"))
	if strings.HasPrefix(contentType, "multipart/form-data") {
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request introduces support for audio processing modes (transcription, translation, and speech synthesis) by adding a new BackendModeAudio and a specialized handleAudioInference HTTP handler to manage multipart/form-data. Review feedback identifies a critical security risk where temporary files created during multipart parsing are not cleaned up, potentially leading to resource exhaustion. Additionally, the new audio handler lacks parity with existing inference handlers regarding installation progress streaming, request recording, and preload support.

Register previously missing OpenAI-compatible routes so they are no longer rejected with 404: /v1/audio/transcriptions, /v1/audio/translations (multipart/form-data) /v1/audio/speech (JSON) /v1/moderations /tokenize and /detokenize (vLLM extension) Add BackendModeAudio and handleAudioInference which extracts the model field from multipart form data. vLLM passes audio requests through natively; llama.cpp returns a descriptive error directing users to the chat completions input_audio content-part instead. Address review feedback: extract shared scheduleInference helper to restore parity (auto-install progress, preload-only, recorder, origin tracking) between handleOpenAIInference and handleAudioInference; fix multipart temp-file leak (defer MultipartForm.RemoveAll); tighten /v1/audio/ path matching to explicit HasSuffix checks; make Content-Type check case-insensitive. Replace the Go routing layer with a Rust reverse proxy (axum + tokio) compiled as a CGo-linked staticlib (router/). The Rust router owns: - All route registration and path aliasing (/v1/ -> /engines/, etc.) - CORS middleware matching the Go CorsMiddleware semantics - Path normalisation (NormalizePathLayer) - Static routes: GET / and GET /version The deleted Go files (pkg/routing/router.go, pkg/routing/routing.go, pkg/middleware/alias.go) are fully replaced. pkg/routing/service.go is trimmed; main.go registers Ollama, Anthropic, and Responses handlers directly on the backend mux. The in-process CGo callback path (pkg/router/handler.go) replaces the network proxy hop: Rust calls Go's http.Handler directly via a streaming protocol — dmr_write_chunk()/dmr_close_stream() push response chunks into a tokio::sync::mpsc channel as Go writes them, so streaming endpoints like POST /models/create deliver progress in real time without buffering the full response. Rust shared utilities are extracted into a dmr-common workspace crate (init_tracing, unix_now_secs). model-cli Rust code is deduplicated: shared send_and_check free function, request_timeout helper, apply_azure_version helper, build_app/run_gateway_async extracted to handlers.rs. Build system: - Cargo workspace root (Cargo.toml) unifies all Rust crates - make build-router-lib compiles the Rust staticlib before go build - Dockerfile installs Rust and builds libdmr_router.a in the builder stage - CI test job installs Rust toolchain and builds the library so go test -race works with CGo enabled - Platform-split CGo LDFLAGS: Darwin keeps -framework flags, Linux uses plain -lpthread/-ldl/-lm - pkg/router/router_stub.go provides a no-op implementation for CGO_ENABLED=0 builds (lint, cross-compilation) Signed-off-by: Eric Curtin <eric.curtin@docker.com>

sourcery-ai Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread pkg/inference/scheduling/http_handler.go Outdated

gemini-code-assist Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread pkg/inference/scheduling/http_handler.go Outdated

Comment thread pkg/inference/scheduling/http_handler.go Outdated

ericcurtin force-pushed the address-these-concerns branch 18 times, most recently from 63dd762 to b210e5a Compare March 31, 2026 19:36

github-advanced-security AI found potential problems Mar 31, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml Dismissed

ericcurtin force-pushed the address-these-concerns branch from b210e5a to fb2a446 Compare March 31, 2026 20:10

ericcurtin force-pushed the address-these-concerns branch from fb2a446 to a76c9f0 Compare March 31, 2026 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add audio, moderations, and tokenize/detokenize endpoint support#816

Add audio, moderations, and tokenize/detokenize endpoint support#816
ericcurtin wants to merge 1 commit into
mainfrom
address-these-concerns

ericcurtin commented Mar 31, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ericcurtin commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ericcurtin commented Mar 31, 2026 •

edited

Loading