Feature request: Audio input support in Messages API

## Summary

Please add **audio input** as a content type in the Messages API (similar to existing image support). Currently the API accepts text and images (JPEG, PNG, GIF, WebP) but not raw audio.

## Why it matters

- **Tone and intent:** With only speech-to-text (STT), applications send transcribed text to the model. All prosody, tone, and speaker identity are lost. The same phrase can mean different things depending on how it was said; models cannot disambiguate without access to the actual sound.
- **Accessibility and UX:** Users who prefer voice input (or rely on it) would benefit from the model receiving the full signal, not just a transcript. This is especially important for assistive use cases.
- **Music and non-speech audio:** Use cases like music understanding, sound design, or analysis of any non-speech audio are impossible without native audio input.
- **Format:** Uncompressed formats like **WAV** (PCM) would be a natural first choice—no codec to decode, minimal complexity on the client side.

## Suggested API shape

Analogous to `ImageBlockParam`, add an audio content block, e.g.:

- `type: "audio"`
- `source`: base64 or URL
- `media_type`: e.g. `"audio/wav"`, `"audio/mpeg"` (if multiple formats are supported)

So the model receives audio as a first-class modality (like images today), not as opaque bytes in text.

## Current workaround

None. Passing base64/hex of WAV in a text block only gives the model a string of data; it has no audio modality to perceive tone, timbre, or meaning from the waveform. STT is a workaround for speech-only flows but discards the very information we are asking for.

Thank you for considering this. It would unlock a lot of applications that today are limited by text-only or image+text input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Audio input support in Messages API #1198

Summary

Why it matters

Suggested API shape

Current workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature request: Audio input support in Messages API #1198

Description

Summary

Why it matters

Suggested API shape

Current workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions