fishaudio · LordElf · Jun 2, 2026 · Jun 2, 2026 · Jun 3, 2026 · Jun 3, 2026
diff --git a/.mintlify/skills/fish-audio-api/SKILL.md b/.mintlify/skills/fish-audio-api/SKILL.md
@@ -64,7 +64,7 @@ Response: streaming audio bytes (`Transfer-Encoding: chunked`) in the format set
 | `format` | `wav` \| `pcm` \| `mp3` \| `opus` | `mp3` | Output format. |
 | `sample_rate` | int \| null | null (44100, or 48000 for opus) | Output sample rate. |
 | `mp3_bitrate` | 64 \| 128 \| 192 | 128 | Only when `format=mp3`. |
-| `opus_bitrate` | -1000 \| 24 \| 32 \| 48 \| 64 | -1000 (auto) | Only when `format=opus`. |
+| `opus_bitrate` | -1000 \| 24000 \| 32000 \| 48000 \| 64000 | -1000 (auto) | Opus bitrate in **bps**. Only when `format=opus`. |
 | `latency` | `low` \| `normal` \| `balanced` | `normal` | Quality vs latency. |
 | `max_new_tokens` | int | 1024 | Per-chunk audio token cap. |
 | `repetition_penalty` | number | 1.2 | >1.0 reduces repeats. |

diff --git a/.mintlify/skills/fish-audio-sdk/SKILL.md b/.mintlify/skills/fish-audio-sdk/SKILL.md
@@ -0,0 +1,119 @@
+---
+name: fish-audio-sdk
+description: Write code with the official Fish Audio SDKs — Python (`fishaudio`, PyPI `fish-audio-sdk`) and JavaScript/TypeScript (`fish-audio`). Use when the user wants text-to-speech, speech-to-text, voice cloning / voice-model management, or realtime WebSocket TTS through the installed SDK rather than raw HTTP. Covers install and auth, sync + async Python, the TypeScript client, exact method signatures and defaults, model selection (s1 / s2-pro), the real exception types, and the Python↔JavaScript naming differences. For raw REST/WebSocket calls without an SDK (curl, unsupported languages, edge runtimes), use the `fish-audio-api` skill instead.
+---
+
+# Fish Audio SDK Skill
+
+Use this skill to generate correct, runnable code with the **official Fish Audio SDKs**:
+
+- **Python** — package `fish-audio-sdk` on PyPI, imported as `fishaudio`. (The same wheel still ships a separate legacy `fish_audio_sdk` package — do **not** mix them; everything here is the modern `fishaudio` package.)
+- **JavaScript / TypeScript** — package `fish-audio` on npm, imported as `FishAudioClient`.
+
+If the user wants raw `curl` / HTTP / WebSocket without installing an SDK, use the **`fish-audio-api`** skill instead.
+
+> This file is the index. Deeper, task-specific rules and full examples live in [`references/`](references/). Read the reference for the task you're doing before writing code.
+
+## Global facts
+
+- **Auth:** both SDKs read the API key from the `FISH_API_KEY` environment variable automatically. Get keys at `https://fish.audio/app/api-keys`. Never hardcode a key — read it from the environment.
+- **Base URL:** `https://api.fish.audio` (override with `base_url=` in Python / `baseUrl:` in JS).
+- **Models:** `s2-pro` (default — highest quality) and `s1`. `speech-1.5` / `speech-1.6` are **deprecated**. In Python pass `model="s2-pro"` (keyword); in JS pass the **positional** `backend` argument.
+- **Audio formats:** `mp3` (default), `wav`, `pcm`, `opus`.
+- **Playback in examples:** `play()` shells out to a system audio tool — Python uses **ffmpeg/ffplay** (or `mpv`), JS uses **ffplay**. It is for local/desktop use; in a server, `save()` to a file or stream the bytes instead. See [references/installation.md](references/installation.md).
+
+## Quick start — Python
+
+```python
+from fishaudio import FishAudio
+from fishaudio.utils import play, save
+
+client = FishAudio()  # reads FISH_API_KEY
+
+# Generate speech (returns the full audio as bytes)
+audio = client.tts.convert(text="Hello from Fish Audio!")
+
+save(audio, "output.mp3")   # write to a file
+# play(audio)               # or play locally (needs ffmpeg)
+```
+
+Async — identical resource tree on `AsyncFishAudio`, used as a context manager:
+
+```python
+import asyncio
+from fishaudio import AsyncFishAudio
+from fishaudio.utils import save
+
+async def main():
+    async with AsyncFishAudio() as client:
+        audio = await client.tts.convert(text="Hello from Fish Audio!")
+        save(audio, "output.mp3")
+
+asyncio.run(main())
+```
+
+## Quick start — JavaScript / TypeScript
+
+```ts
+import { FishAudioClient, play } from "fish-audio";
+
+const client = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });
+
+// convert() returns audio you can play or pipe to a file
+const audio = await client.textToSpeech.convert({
+  text: "Hello from Fish Audio!",
+}); // defaults to model "s2-pro"
+await play(audio); // local playback (needs ffplay)
+```
+
+To pick a model in JS, pass `backend` as the **positional** argument (not a named option):
+
+```ts
+const audio = await client.textToSpeech.convert({ text: "Hi" }, "s1");
+```
+
+## Capabilities → references
+
+| Task                                                             | Reference                                                    |
+| ---------------------------------------------------------------- | ------------------------------------------------------------ |
+| Install, auth, playback deps, verify a key                       | [references/installation.md](references/installation.md)     |
+| Text-to-Speech (convert, stream, formats, prosody, model select) | [references/text-to-speech.md](references/text-to-speech.md) |
+| Voice cloning (instant references + persistent voice models)     | [references/voice-cloning.md](references/voice-cloning.md)   |
+| Speech-to-Text (transcribe, segments, timestamps)                | [references/speech-to-text.md](references/speech-to-text.md) |
+| Realtime WebSocket TTS (stream text → audio)                     | [references/websocket.md](references/websocket.md)           |
+| Errors, retries, and timeouts (the **real** exception types)     | [references/errors.md](references/errors.md)                 |
+
+## Python ↔ JavaScript name map
+
+The two SDKs do **not** use the same names. Use this map when porting code between them.
+
+| Concept               | Python (`fishaudio`)                              | JavaScript (`fish-audio`)                                  |
+| --------------------- | ------------------------------------------------- | ---------------------------------------------------------- |
+| Client                | `FishAudio()` / `AsyncFishAudio()`                | `new FishAudioClient({ apiKey })`                          |
+| Text-to-Speech        | `client.tts.convert(text=...)` → `bytes`          | `client.textToSpeech.convert({ text })`                    |
+| TTS HTTP stream       | `client.tts.stream(...)` → `AudioStream`          | (use `convert`; realtime streaming is `convertRealtime`)   |
+| Realtime WebSocket    | `client.tts.stream_websocket(text_stream)`        | `client.textToSpeech.convertRealtime(request, textStream)` |
+| Speech-to-Text        | `client.asr.transcribe(audio=...)`                | `client.speechToText.convert({ audio })`                   |
+| List voice models     | `client.voices.list()`                            | `client.voices.search()`                                   |
+| Get voice model       | `client.voices.get(id)`                           | `client.voices.get(id)`                                    |
+| Create voice (clone)  | `client.voices.create(title=..., voices=[bytes])` | `client.voices.ivc.create({ title, voices: [File] })`      |
+| Update / delete voice | `client.voices.update(id, ...)` / `delete(id)`    | `client.voices.update(id, ...)` / `delete(id)`             |
+| Credit balance        | `client.account.get_credits()`                    | `client.user.get_api_credit()`                             |
+| Subscription package  | `client.account.get_package()`                    | `client.user.get_package()`                                |
+| Choose model          | `model="s2-pro"` keyword arg                      | positional `backend` arg, e.g. `convert(req, "s2-pro")`    |
+
+## Decision shortcuts
+
+- **Audio from text** → `tts.convert` (Python) / `textToSpeech.convert` (JS).
+- **Reuse a saved voice** → pass `reference_id` (the voice model `id`).
+- **Clone a voice instantly from a clip** → pass `references=[ReferenceAudio(audio=..., text=...)]` (Python) / `references: [{ audio, text }]` (JS). See [voice-cloning](references/voice-cloning.md).
+- **Persistent custom voice to reuse** → create a voice model, then use its `id` as `reference_id`.
+- **Stream tokens from an LLM and play speech as it arrives** → `tts.stream_websocket` (Python) / `textToSpeech.convertRealtime` (JS). See [websocket](references/websocket.md).
+- **Transcribe audio** → `asr.transcribe` (Python) / `speechToText.convert` (JS).
+
+## Gotchas (verified against the SDK source)
+
+- Python `latency` accepts only **`"normal"` or `"balanced"`** (default `"balanced"`) — there is no `"low"`.
+- The Python client has **no `max_retries`** and does **not** auto-retry; the JS client **does** auto-retry (configurable via per-call `requestOptions.maxRetries`). See [errors](references/errors.md).
+- Python defines a `ValidationError` class but **never raises it** — don't catch it expecting validation failures; a 422 surfaces as `APIError`. The JS SDK throws `UnprocessableEntityError` on 422.
+- ASR segment `start` / `end` are in **seconds**, but `duration` is in **milliseconds**. See [speech-to-text](references/speech-to-text.md).
diff --git a/.mintlify/skills/fish-audio-sdk/references/errors.md b/.mintlify/skills/fish-audio-sdk/references/errors.md
@@ -0,0 +1,126 @@
+# Errors, Retries & Timeouts
+
+The two SDKs have **different** exception models. The tables below reflect what the SDK source actually raises — not every exported class is thrown.
+
+## Python exceptions
+
+Hierarchy (all subclasses of `FishAudioError`):
+
+| Exception             | When                                           | Attributes                        |
+| --------------------- | ---------------------------------------------- | --------------------------------- |
+| `APIError`            | base for HTTP errors                           | `.status`, `.message`, `.body`    |
+| `AuthenticationError` | 401 — bad/missing key                          | (APIError)                        |
+| `PermissionError`     | 403                                            | (APIError)                        |
+| `NotFoundError`       | 404 — voice id not found                       | (APIError)                        |
+| `RateLimitError`      | 429                                            | (APIError)                        |
+| `ServerError`         | 5xx                                            | (APIError)                        |
+| `WebSocketError`      | realtime stream failed                         | —                                 |
+| `DependencyError`     | missing system tool (e.g. ffmpeg for `play()`) | `.dependency`, `.install_command` |
+
+```python
+from fishaudio import FishAudio
+from fishaudio.exceptions import (
+    AuthenticationError,
+    RateLimitError,
+    NotFoundError,
+    APIError,
+    FishAudioError,
+)
+
+client = FishAudio()
+try:
+    audio = client.tts.convert(text="Hello!", reference_id="maybe-missing")
+except AuthenticationError:
+    ...  # bad API key
+except RateLimitError:
+    ...  # slow down / out of quota
+except NotFoundError:
+    ...  # reference_id doesn't exist
+except APIError as e:
+    print(e.status, e.message)  # any other HTTP error
+except FishAudioError as e:
+    print("SDK error:", e)      # non-HTTP (e.g. WebSocketError, DependencyError)
+```
+
+> **Do not catch `ValidationError`.** The class exists and is exported, but the SDK **never raises it**. Invalid input comes back as an `APIError` (HTTP 422). Catch `APIError` (and read `.status == 422`) instead.
+
+### Retries & timeouts (Python)
+
+- **No automatic retries.** The Python client makes a single request and raises on failure. Implement your own retry loop if you need one (e.g. back off on `RateLimitError`).
+- **Timeout** is set on the client: `FishAudio(timeout=240.0)` (seconds, default 240).
+- `RequestOptions(max_retries=...)` exists but is currently a **no-op** — don't rely on it. `RequestOptions(timeout=..., additional_headers=...)` does work per request:
+
+```python
+from fishaudio.core.request_options import RequestOptions
+
+audio = client.tts.convert(
+    text="Hello!",
+    request_options=RequestOptions(timeout=30.0, additional_headers={"X-Trace": "abc"}),
+)
+```
+
+## JavaScript exceptions
+
+```ts
+import {
+  FishAudioClient,
+  FishAudioError,
+  FishAudioTimeoutError,
+} from "fish-audio";
+import { UnprocessableEntityError } from "fish-audio"; // re-exported from the package root
+
+const client = new FishAudioClient();
+try {
+  const audio = await client.textToSpeech.convert({
+    text: "Hello!",
+    reference_id: "maybe-missing",
+  });
+} catch (err) {
+  if (err instanceof UnprocessableEntityError) {
+    console.error("422 validation:", err.body?.detail); // [{ loc, msg, type }]
+  } else if (err instanceof FishAudioTimeoutError) {
+    console.error("request timed out");
+  } else if (err instanceof FishAudioError) {
+    console.error(err.statusCode, err.body); // branch on err.statusCode (401/403/404/...)
+  } else {
+    throw err;
+  }
+}
+```
+
+What the JS client actually throws:
+
+| Error                                                 | When                                                                                         |
+| ----------------------------------------------------- | -------------------------------------------------------------------------------------------- |
+| `UnprocessableEntityError` (extends `FishAudioError`) | 422 — the **only** typed HTTP subclass thrown; `.body` is `{ detail: [{ loc, msg, type }] }` |
+| `FishAudioError`                                      | every other non-2xx response; read `.statusCode`, `.body`, `.rawResponse`                    |
+| `FishAudioTimeoutError`                               | request exceeded the timeout                                                                 |
+
+> The package also exports `BadRequestError`, `UnauthorizedError`, `ForbiddenError`, `NotFoundError`, and `TooEarlyError`, but the current client throws a generic `FishAudioError` for those statuses. **Branch on `err.statusCode`** rather than relying on `instanceof NotFoundError`.
+
+### Retries & timeouts (JavaScript)
+
+- **Automatic retries are built in.** The client retries `408`, `429`, and `>= 500` with exponential backoff (≈1 s base, 60 s cap) plus jitter, honoring `Retry-After`. You don't need to hand-roll a 429 loop.
+- Tune per call via `requestOptions` (the trailing argument on every method):
+
+```ts
+const audio = await client.textToSpeech.convert({ text: "Hello!" }, "s2-pro", {
+  maxRetries: 5,
+  timeoutInSeconds: 30,
+  abortSignal: controller.signal,
+});
+```
+
+- Default request timeout is **240 s** (`240000 ms`); override with `requestOptions.timeoutInSeconds`.
+- `requestOptions` also accepts per-request `apiKey`, `headers`, and `queryParams`.
+
+## Inspecting raw responses (JS)
+
+Every method returns an awaitable that also exposes the raw response:
+
+```ts
+const { data, rawResponse } = await client.textToSpeech
+  .convert({ text: "Hi" })
+  .withRawResponse();
+console.log(rawResponse.headers);
+```
diff --git a/.mintlify/skills/fish-audio-sdk/references/installation.md b/.mintlify/skills/fish-audio-sdk/references/installation.md
@@ -0,0 +1,91 @@
+# Installation & Authentication
+
+## Python (`fishaudio`)
+
+```bash
+pip install fish-audio-sdk          # imported as `fishaudio`
+pip install "fish-audio-sdk[utils]" # adds local audio playback helpers (play)
+```
+
+- Requires Python 3.9+.
+- Import name is **`fishaudio`** even though the PyPI/dist name is `fish-audio-sdk`.
+
+## JavaScript / TypeScript (`fish-audio`)
+
+```bash
+npm install fish-audio
+# or: pnpm add fish-audio  / yarn add fish-audio
+```
+
+- Requires Node.js 18+ (uses the global `fetch` / Web Streams).
+
+## Authentication
+
+Get an API key at `https://fish.audio/app/api-keys`. Both SDKs read `FISH_API_KEY` from the environment automatically.
+
+```bash
+export FISH_API_KEY=your_api_key_here
+```
+
+```python
+from fishaudio import FishAudio
+
+client = FishAudio()                       # reads FISH_API_KEY
+client = FishAudio(api_key="your_api_key") # or pass explicitly
+```
+
+```ts
+import { FishAudioClient } from "fish-audio";
+
+const client = new FishAudioClient(); // reads FISH_API_KEY
+const client2 = new FishAudioClient({ apiKey: process.env.MY_KEY }); // or pass explicitly
+```
+
+Never hardcode a key in source. If neither the argument nor `FISH_API_KEY` is set, the Python client raises `ValueError` at construction time.
+
+### Other client options
+
+| Option             | Python                              | JavaScript                                 |
+| ------------------ | ----------------------------------- | ------------------------------------------ |
+| API key            | `api_key=`                          | `apiKey:`                                  |
+| Base URL           | `base_url="https://api.fish.audio"` | `baseUrl:` / `environment:`                |
+| Request timeout    | `timeout=240.0` (seconds)           | per-call `requestOptions.timeoutInSeconds` |
+| Custom HTTP client | `httpx_client=`                     | (not exposed)                              |
+
+> Python caveat: if you pass your own `httpx_client`, the SDK uses it **as-is** — your `base_url`, `timeout`, and the `Authorization` header are **not** applied to it. Pre-configure those on the client you inject.
+
+There is no client-level `max_retries` or `default_headers` option in Python. Per-request headers go through `request_options`. See [errors.md](errors.md) for retry/timeout behavior.
+
+## Local audio playback
+
+The `play()` helper is for local/desktop use and shells out to a system tool:
+
+- **Python:** needs `ffmpeg` (or pass `use_ffmpeg=False` to try `mpv`). Install the `[utils]` extra. Missing tools raise `DependencyError` with the install command.
+- **JavaScript:** spawns `ffplay` (from ffmpeg) and is **Node-only**.
+
+Install ffmpeg:
+
+```bash
+# macOS
+brew install ffmpeg
+# Debian/Ubuntu
+sudo apt-get install ffmpeg
+```
+
+In a server or browser context, don't use `play()` — use `save()` (Python) or write/stream the bytes yourself.
+
+## Verify a key works
+
+```python
+from fishaudio import FishAudio
+
+client = FishAudio()
+print(client.account.get_credits())  # raises AuthenticationError (401) if the key is bad
+```
+
+```ts
+import { FishAudioClient } from "fish-audio";
+
+const client = new FishAudioClient();
+console.log(await client.user.get_api_credit());
+```