Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .mintlify/skills/fish-audio-api/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Response: streaming audio bytes (`Transfer-Encoding: chunked`) in the format set
| `format` | `wav` \| `pcm` \| `mp3` \| `opus` | `mp3` | Output format. |
| `sample_rate` | int \| null | null (44100, or 48000 for opus) | Output sample rate. |
| `mp3_bitrate` | 64 \| 128 \| 192 | 128 | Only when `format=mp3`. |
| `opus_bitrate` | -1000 \| 24 \| 32 \| 48 \| 64 | -1000 (auto) | Only when `format=opus`. |
| `opus_bitrate` | -1000 \| 24000 \| 32000 \| 48000 \| 64000 | -1000 (auto) | Opus bitrate in **bps**. Only when `format=opus`. |
| `latency` | `low` \| `normal` \| `balanced` | `normal` | Quality vs latency. |
| `max_new_tokens` | int | 1024 | Per-chunk audio token cap. |
| `repetition_penalty` | number | 1.2 | >1.0 reduces repeats. |
Expand Down
119 changes: 119 additions & 0 deletions .mintlify/skills/fish-audio-sdk/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
name: fish-audio-sdk
description: Write code with the official Fish Audio SDKs — Python (`fishaudio`, PyPI `fish-audio-sdk`) and JavaScript/TypeScript (`fish-audio`). Use when the user wants text-to-speech, speech-to-text, voice cloning / voice-model management, or realtime WebSocket TTS through the installed SDK rather than raw HTTP. Covers install and auth, sync + async Python, the TypeScript client, exact method signatures and defaults, model selection (s1 / s2-pro), the real exception types, and the Python↔JavaScript naming differences. For raw REST/WebSocket calls without an SDK (curl, unsupported languages, edge runtimes), use the `fish-audio-api` skill instead.
---

# Fish Audio SDK Skill

Use this skill to generate correct, runnable code with the **official Fish Audio SDKs**:

- **Python** — package `fish-audio-sdk` on PyPI, imported as `fishaudio`. (The same wheel still ships a separate legacy `fish_audio_sdk` package — do **not** mix them; everything here is the modern `fishaudio` package.)
- **JavaScript / TypeScript** — package `fish-audio` on npm, imported as `FishAudioClient`.

If the user wants raw `curl` / HTTP / WebSocket without installing an SDK, use the **`fish-audio-api`** skill instead.

> This file is the index. Deeper, task-specific rules and full examples live in [`references/`](references/). Read the reference for the task you're doing before writing code.

## Global facts

- **Auth:** both SDKs read the API key from the `FISH_API_KEY` environment variable automatically. Get keys at `https://fish.audio/app/api-keys`. Never hardcode a key — read it from the environment.
- **Base URL:** `https://api.fish.audio` (override with `base_url=` in Python / `baseUrl:` in JS).
- **Models:** `s2-pro` (default — highest quality) and `s1`. `speech-1.5` / `speech-1.6` are **deprecated**. In Python pass `model="s2-pro"` (keyword); in JS pass the **positional** `backend` argument.
- **Audio formats:** `mp3` (default), `wav`, `pcm`, `opus`.
- **Playback in examples:** `play()` shells out to a system audio tool — Python uses **ffmpeg/ffplay** (or `mpv`), JS uses **ffplay**. It is for local/desktop use; in a server, `save()` to a file or stream the bytes instead. See [references/installation.md](references/installation.md).

## Quick start — Python

```python
from fishaudio import FishAudio
from fishaudio.utils import play, save

client = FishAudio() # reads FISH_API_KEY

# Generate speech (returns the full audio as bytes)
audio = client.tts.convert(text="Hello from Fish Audio!")

save(audio, "output.mp3") # write to a file
# play(audio) # or play locally (needs ffmpeg)
```

Async — identical resource tree on `AsyncFishAudio`, used as a context manager:

```python
import asyncio
from fishaudio import AsyncFishAudio
from fishaudio.utils import save

async def main():
async with AsyncFishAudio() as client:
audio = await client.tts.convert(text="Hello from Fish Audio!")
save(audio, "output.mp3")

asyncio.run(main())
```

## Quick start — JavaScript / TypeScript

```ts
import { FishAudioClient, play } from "fish-audio";

const client = new FishAudioClient({ apiKey: process.env.FISH_API_KEY });

// convert() returns audio you can play or pipe to a file
const audio = await client.textToSpeech.convert({
text: "Hello from Fish Audio!",
}); // defaults to model "s2-pro"
await play(audio); // local playback (needs ffplay)
```

To pick a model in JS, pass `backend` as the **positional** argument (not a named option):

```ts
const audio = await client.textToSpeech.convert({ text: "Hi" }, "s1");
```

## Capabilities → references

| Task | Reference |
| ---------------------------------------------------------------- | ------------------------------------------------------------ |
| Install, auth, playback deps, verify a key | [references/installation.md](references/installation.md) |
| Text-to-Speech (convert, stream, formats, prosody, model select) | [references/text-to-speech.md](references/text-to-speech.md) |
| Voice cloning (instant references + persistent voice models) | [references/voice-cloning.md](references/voice-cloning.md) |
| Speech-to-Text (transcribe, segments, timestamps) | [references/speech-to-text.md](references/speech-to-text.md) |
| Realtime WebSocket TTS (stream text → audio) | [references/websocket.md](references/websocket.md) |
| Errors, retries, and timeouts (the **real** exception types) | [references/errors.md](references/errors.md) |

## Python ↔ JavaScript name map

The two SDKs do **not** use the same names. Use this map when porting code between them.

| Concept | Python (`fishaudio`) | JavaScript (`fish-audio`) |
| --------------------- | ------------------------------------------------- | ---------------------------------------------------------- |
| Client | `FishAudio()` / `AsyncFishAudio()` | `new FishAudioClient({ apiKey })` |
| Text-to-Speech | `client.tts.convert(text=...)` → `bytes` | `client.textToSpeech.convert({ text })` |
| TTS HTTP stream | `client.tts.stream(...)` → `AudioStream` | (use `convert`; realtime streaming is `convertRealtime`) |
| Realtime WebSocket | `client.tts.stream_websocket(text_stream)` | `client.textToSpeech.convertRealtime(request, textStream)` |
| Speech-to-Text | `client.asr.transcribe(audio=...)` | `client.speechToText.convert({ audio })` |
| List voice models | `client.voices.list()` | `client.voices.search()` |
| Get voice model | `client.voices.get(id)` | `client.voices.get(id)` |
| Create voice (clone) | `client.voices.create(title=..., voices=[bytes])` | `client.voices.ivc.create({ title, voices: [File] })` |
| Update / delete voice | `client.voices.update(id, ...)` / `delete(id)` | `client.voices.update(id, ...)` / `delete(id)` |
| Credit balance | `client.account.get_credits()` | `client.user.get_api_credit()` |
| Subscription package | `client.account.get_package()` | `client.user.get_package()` |
| Choose model | `model="s2-pro"` keyword arg | positional `backend` arg, e.g. `convert(req, "s2-pro")` |

## Decision shortcuts

- **Audio from text** → `tts.convert` (Python) / `textToSpeech.convert` (JS).
- **Reuse a saved voice** → pass `reference_id` (the voice model `id`).
- **Clone a voice instantly from a clip** → pass `references=[ReferenceAudio(audio=..., text=...)]` (Python) / `references: [{ audio, text }]` (JS). See [voice-cloning](references/voice-cloning.md).
- **Persistent custom voice to reuse** → create a voice model, then use its `id` as `reference_id`.
- **Stream tokens from an LLM and play speech as it arrives** → `tts.stream_websocket` (Python) / `textToSpeech.convertRealtime` (JS). See [websocket](references/websocket.md).
- **Transcribe audio** → `asr.transcribe` (Python) / `speechToText.convert` (JS).

## Gotchas (verified against the SDK source)

- Python `latency` accepts only **`"normal"` or `"balanced"`** (default `"balanced"`) — there is no `"low"`.
- The Python client has **no `max_retries`** and does **not** auto-retry; the JS client **does** auto-retry (configurable via per-call `requestOptions.maxRetries`). See [errors](references/errors.md).
- Python defines a `ValidationError` class but **never raises it** — don't catch it expecting validation failures; a 422 surfaces as `APIError`. The JS SDK throws `UnprocessableEntityError` on 422.
- ASR segment `start` / `end` are in **seconds**, but `duration` is in **milliseconds**. See [speech-to-text](references/speech-to-text.md).
Comment on lines +118 to +119
126 changes: 126 additions & 0 deletions .mintlify/skills/fish-audio-sdk/references/errors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Errors, Retries & Timeouts

The two SDKs have **different** exception models. The tables below reflect what the SDK source actually raises — not every exported class is thrown.

## Python exceptions

Hierarchy (all subclasses of `FishAudioError`):

| Exception | When | Attributes |
| --------------------- | ---------------------------------------------- | --------------------------------- |
| `APIError` | base for HTTP errors | `.status`, `.message`, `.body` |
| `AuthenticationError` | 401 — bad/missing key | (APIError) |
| `PermissionError` | 403 | (APIError) |
| `NotFoundError` | 404 — voice id not found | (APIError) |
| `RateLimitError` | 429 | (APIError) |
| `ServerError` | 5xx | (APIError) |
| `WebSocketError` | realtime stream failed | — |
| `DependencyError` | missing system tool (e.g. ffmpeg for `play()`) | `.dependency`, `.install_command` |

```python
from fishaudio import FishAudio
from fishaudio.exceptions import (
AuthenticationError,
RateLimitError,
NotFoundError,
APIError,
FishAudioError,
)

client = FishAudio()
try:
audio = client.tts.convert(text="Hello!", reference_id="maybe-missing")
except AuthenticationError:
... # bad API key
except RateLimitError:
... # slow down / out of quota
except NotFoundError:
... # reference_id doesn't exist
except APIError as e:
print(e.status, e.message) # any other HTTP error
except FishAudioError as e:
print("SDK error:", e) # non-HTTP (e.g. WebSocketError, DependencyError)
```

> **Do not catch `ValidationError`.** The class exists and is exported, but the SDK **never raises it**. Invalid input comes back as an `APIError` (HTTP 422). Catch `APIError` (and read `.status == 422`) instead.

### Retries & timeouts (Python)

- **No automatic retries.** The Python client makes a single request and raises on failure. Implement your own retry loop if you need one (e.g. back off on `RateLimitError`).
- **Timeout** is set on the client: `FishAudio(timeout=240.0)` (seconds, default 240).
- `RequestOptions(max_retries=...)` exists but is currently a **no-op** — don't rely on it. `RequestOptions(timeout=..., additional_headers=...)` does work per request:

```python
from fishaudio.core.request_options import RequestOptions

audio = client.tts.convert(
text="Hello!",
request_options=RequestOptions(timeout=30.0, additional_headers={"X-Trace": "abc"}),
)
```

## JavaScript exceptions

```ts
import {
FishAudioClient,
FishAudioError,
FishAudioTimeoutError,
} from "fish-audio";
import { UnprocessableEntityError } from "fish-audio"; // re-exported from the package root

const client = new FishAudioClient();
try {
const audio = await client.textToSpeech.convert({
text: "Hello!",
reference_id: "maybe-missing",
});
} catch (err) {
if (err instanceof UnprocessableEntityError) {
console.error("422 validation:", err.body?.detail); // [{ loc, msg, type }]
} else if (err instanceof FishAudioTimeoutError) {
console.error("request timed out");
} else if (err instanceof FishAudioError) {
console.error(err.statusCode, err.body); // branch on err.statusCode (401/403/404/...)
} else {
throw err;
}
}
```

What the JS client actually throws:

| Error | When |
| ----------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| `UnprocessableEntityError` (extends `FishAudioError`) | 422 — the **only** typed HTTP subclass thrown; `.body` is `{ detail: [{ loc, msg, type }] }` |
| `FishAudioError` | every other non-2xx response; read `.statusCode`, `.body`, `.rawResponse` |
| `FishAudioTimeoutError` | request exceeded the timeout |

> The package also exports `BadRequestError`, `UnauthorizedError`, `ForbiddenError`, `NotFoundError`, and `TooEarlyError`, but the current client throws a generic `FishAudioError` for those statuses. **Branch on `err.statusCode`** rather than relying on `instanceof NotFoundError`.

### Retries & timeouts (JavaScript)

- **Automatic retries are built in.** The client retries `408`, `429`, and `>= 500` with exponential backoff (≈1 s base, 60 s cap) plus jitter, honoring `Retry-After`. You don't need to hand-roll a 429 loop.
- Tune per call via `requestOptions` (the trailing argument on every method):

```ts
const audio = await client.textToSpeech.convert({ text: "Hello!" }, "s2-pro", {
maxRetries: 5,
timeoutInSeconds: 30,
abortSignal: controller.signal,
});
```

- Default request timeout is **240 s** (`240000 ms`); override with `requestOptions.timeoutInSeconds`.
- `requestOptions` also accepts per-request `apiKey`, `headers`, and `queryParams`.

## Inspecting raw responses (JS)

Every method returns an awaitable that also exposes the raw response:

```ts
const { data, rawResponse } = await client.textToSpeech
.convert({ text: "Hi" })
.withRawResponse();
console.log(rawResponse.headers);
```
91 changes: 91 additions & 0 deletions .mintlify/skills/fish-audio-sdk/references/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Installation & Authentication

## Python (`fishaudio`)

```bash
pip install fish-audio-sdk # imported as `fishaudio`
pip install "fish-audio-sdk[utils]" # adds local audio playback helpers (play)
```

- Requires Python 3.9+.
- Import name is **`fishaudio`** even though the PyPI/dist name is `fish-audio-sdk`.

## JavaScript / TypeScript (`fish-audio`)

```bash
npm install fish-audio
# or: pnpm add fish-audio / yarn add fish-audio
```

- Requires Node.js 18+ (uses the global `fetch` / Web Streams).

## Authentication

Get an API key at `https://fish.audio/app/api-keys`. Both SDKs read `FISH_API_KEY` from the environment automatically.

```bash
export FISH_API_KEY=your_api_key_here
```

```python
from fishaudio import FishAudio

client = FishAudio() # reads FISH_API_KEY
client = FishAudio(api_key="your_api_key") # or pass explicitly
```

```ts
import { FishAudioClient } from "fish-audio";

const client = new FishAudioClient(); // reads FISH_API_KEY
const client2 = new FishAudioClient({ apiKey: process.env.MY_KEY }); // or pass explicitly
```

Never hardcode a key in source. If neither the argument nor `FISH_API_KEY` is set, the Python client raises `ValueError` at construction time.

### Other client options

| Option | Python | JavaScript |
| ------------------ | ----------------------------------- | ------------------------------------------ |
| API key | `api_key=` | `apiKey:` |
| Base URL | `base_url="https://api.fish.audio"` | `baseUrl:` / `environment:` |
| Request timeout | `timeout=240.0` (seconds) | per-call `requestOptions.timeoutInSeconds` |
| Custom HTTP client | `httpx_client=` | (not exposed) |

> Python caveat: if you pass your own `httpx_client`, the SDK uses it **as-is** — your `base_url`, `timeout`, and the `Authorization` header are **not** applied to it. Pre-configure those on the client you inject.

There is no client-level `max_retries` or `default_headers` option in Python. Per-request headers go through `request_options`. See [errors.md](errors.md) for retry/timeout behavior.

## Local audio playback

The `play()` helper is for local/desktop use and shells out to a system tool:

- **Python:** needs `ffmpeg` (or pass `use_ffmpeg=False` to try `mpv`). Install the `[utils]` extra. Missing tools raise `DependencyError` with the install command.
- **JavaScript:** spawns `ffplay` (from ffmpeg) and is **Node-only**.

Install ffmpeg:

```bash
# macOS
brew install ffmpeg
# Debian/Ubuntu
sudo apt-get install ffmpeg
```

In a server or browser context, don't use `play()` — use `save()` (Python) or write/stream the bytes yourself.

## Verify a key works

```python
from fishaudio import FishAudio

client = FishAudio()
print(client.account.get_credits()) # raises AuthenticationError (401) if the key is bad
```

```ts
import { FishAudioClient } from "fish-audio";

const client = new FishAudioClient();
console.log(await client.user.get_api_credit());
```
Loading
Loading