Skip to content

Commit 3fc6d53

Browse files
author
team-coding-agent-1
committed
Add Voxtral-Mini-4B-Realtime model support via vLLM backend
- Add gallery definition for Voxtral-Mini-4B-Realtime-2602 model - Configure vLLM backend with recommended settings for real-time ASR - Update gallery index to point to new model configuration - Model supports multilingual transcription with <500ms latency - Uses vLLM's Realtime API for streaming audio processing References: - https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602 - #8401
1 parent bf4f8da commit 3fc6d53

2 files changed

Lines changed: 33 additions & 18 deletions

File tree

gallery/index.yaml

Lines changed: 6 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -478,34 +478,22 @@
478478
model: nvidia/parakeet-tdt-0.6b-v3
479479
- name: voxtral-mini-4b-realtime
480480
license: apache-2.0
481-
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
481+
url: "github:mudler/LocalAI/gallery/voxtral-mini-4b-realtime.yaml@master"
482482
description: |
483-
Voxtral Mini 4B Realtime is a speech-to-text model from Mistral AI. It is a 4B parameter model optimized for fast, accurate audio transcription with low latency, making it ideal for real-time applications. The model uses the Voxtral architecture for efficient audio processing.
483+
Voxtral Mini 4B Realtime is a multilingual, realtime speech-transcription model from Mistral AI.
484+
It achieves accuracy comparable to offline systems with a delay of <500ms and supports 13 languages.
485+
This model is designed for real-time automatic speech recognition (ASR) with streaming capabilities
486+
and benefits from vLLM's Realtime API for low-latency transcription workflows.
484487
urls:
485488
- https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602
486-
- https://github.com/antirez/voxtral.c
487489
tags:
488490
- stt
489491
- speech-to-text
490492
- audio-transcription
493+
- vllm
491494
- cpu
492495
- metal
493496
- mistral
494-
overrides:
495-
backend: voxtral
496-
known_usecases:
497-
- transcript
498-
parameters:
499-
model: voxtral-model
500-
files:
501-
- filename: voxtral-model/consolidated.safetensors
502-
uri: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602/resolve/main/consolidated.safetensors
503-
sha256: 263f178fe752c90a2ae58f037a95ed092db8b14768b0978b8c48f66979c8345d
504-
- filename: voxtral-model/params.json
505-
uri: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602/resolve/main/params.json
506-
- filename: voxtral-model/tekken.json
507-
uri: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602/resolve/main/tekken.json
508-
sha256: 8434af1d39eba99f0ef46cf1450bf1a63fa941a26933a1ef5dbbf4adf0d00e44
509497
- name: moonshine-tiny
510498
license: apache-2.0
511499
size: "108MB"
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
name: "voxtral-mini-4b-realtime"
3+
4+
description: |
5+
Voxtral Mini 4B Realtime is a multilingual, realtime speech-transcription model from Mistral AI.
6+
It achieves accuracy comparable to offline systems with a delay of <500ms and supports 13 languages.
7+
This model is designed for real-time automatic speech recognition (ASR) with streaming capabilities
8+
and benefits from vLLM's Realtime API for low-latency transcription workflows.
9+
10+
config_file: |
11+
name: voxtral-mini-4b-realtime
12+
description: Voxtral Mini 4B Realtime - Real-time ASR model via vLLM
13+
backend: vllm
14+
parameters:
15+
model: mistralai/Voxtral-Mini-4B-Realtime-2602
16+
known_usecases:
17+
- transcript
18+
template:
19+
use_tokenizer_template: true
20+
prediction:
21+
max_tokens: 45000
22+
backend_options:
23+
vllm:
24+
# Recommended settings for Voxtral Realtime
25+
# --max-model-len: 131072 (default, supports ~3h of transcription)
26+
# Temperature should be set to 0.0 for ASR
27+
compilation_config: '{"cudagraph_mode": "PIECEWISE"}'

0 commit comments

Comments
 (0)