Skip to content

Feature request: Audio input support in Messages API #1198

@KarataevDmitry

Description

@KarataevDmitry

Summary

Please add audio input as a content type in the Messages API (similar to existing image support). Currently the API accepts text and images (JPEG, PNG, GIF, WebP) but not raw audio.

Why it matters

  • Tone and intent: With only speech-to-text (STT), applications send transcribed text to the model. All prosody, tone, and speaker identity are lost. The same phrase can mean different things depending on how it was said; models cannot disambiguate without access to the actual sound.
  • Accessibility and UX: Users who prefer voice input (or rely on it) would benefit from the model receiving the full signal, not just a transcript. This is especially important for assistive use cases.
  • Music and non-speech audio: Use cases like music understanding, sound design, or analysis of any non-speech audio are impossible without native audio input.
  • Format: Uncompressed formats like WAV (PCM) would be a natural first choice—no codec to decode, minimal complexity on the client side.

Suggested API shape

Analogous to ImageBlockParam, add an audio content block, e.g.:

  • type: "audio"
  • source: base64 or URL
  • media_type: e.g. "audio/wav", "audio/mpeg" (if multiple formats are supported)

So the model receives audio as a first-class modality (like images today), not as opaque bytes in text.

Current workaround

None. Passing base64/hex of WAV in a text block only gives the model a string of data; it has no audio modality to perceive tone, timbre, or meaning from the waveform. STT is a workaround for speech-only flows but discards the very information we are asking for.

Thank you for considering this. It would unlock a lot of applications that today are limited by text-only or image+text input.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions