Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,23 @@ This project manages **Vapi voice agent configurations** as code. All resources
| Multilingual agents (English/Spanish) | `docs/learnings/multilingual.md` |
| WebSocket audio streaming | `docs/learnings/websocket.md` |
| Building outbound calling agents | `docs/learnings/outbound-agents.md` |
| Bulk-dialing from a CSV (Outbound Call Campaigns) | `docs/learnings/outbound-campaigns.md` |
| Voicemail detection / VM vs human classification | `docs/learnings/voicemail-detection.md` |
| Enforcing call time limits / graceful call ending | `docs/learnings/call-duration.md` |
| Voice provider field cheat-sheet (Cartesia vs 11labs vs OpenAI etc.) | `docs/learnings/voice-providers.md` |
| YAML authoring conventions, .vapi-ignore lifecycle | `docs/learnings/yaml-conventions.md` |

**Where new knowledge goes:**

| Kind of knowledge | Home | Convention |
|---|---|---|
| Per-resource gotchas, recipes, troubleshooting | `docs/learnings/<topic>.md` | One file per resource type or topic. Add a row to this table AND to `docs/learnings/README.md` when you add a new file. `CLAUDE.md` mirrors this list — keep both in sync. |
| Engine-friction log (push/pull/state/cleanup pain points + fixes) | `improvements.md` | Format: Problem → Current behavior → Risk → Current mitigation → Possible fix → Status. Mark `[RESOLVED YYYY-MM-DD] (#<PR>)` when fixed; never delete. |
| Code-level rationale (why a function works the way it does) | Code comments | Only when the WHY is non-obvious — not what the code does. Don't reference PR/issue numbers; they rot. |
| Setup, install, repo orientation | `README.md` | One-time onboarding only. Don't put runtime gotchas here. |

If you're unsure where something goes, default to `docs/learnings/`. The README and engine-friction log are deliberately narrow.

---

## Quick Reference
Expand Down Expand Up @@ -65,7 +77,7 @@ docs/
├── changelog.md # Template for tracking per-customer config changes
└── learnings/ # Gotchas, recipes, and troubleshooting
├── README.md # Task-routed index — start here
├── tools.md # Tool configuration gotchas
├── tools.md # Tool configuration gotchas (incl. dedup behavior)
├── assistants.md # Assistant configuration gotchas
├── squads.md # Squad and multi-agent gotchas
├── structured-outputs.md # Structured output gotchas + KPI patterns
Expand All @@ -78,7 +90,11 @@ docs/
├── multilingual.md # Multilingual agent architecture guide
├── websocket.md # WebSocket transport rules
├── outbound-agents.md # Outbound agent design & IVR navigation
└── voicemail-detection.md # Voicemail vs human classification
├── outbound-campaigns.md # Bulk-dial CSV campaigns + dynamic variables
├── voicemail-detection.md # Voicemail vs human classification
├── call-duration.md # Call time limits and graceful end-of-call
├── voice-providers.md # Per-provider voice block field cheat-sheet
└── yaml-conventions.md # YAML authoring conventions, .vapi-ignore lifecycle

resources/
├── <org>/ # Org-scoped resources (npm run push -- <org> reads here)
Expand Down
12 changes: 11 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ When both files exist, follow both. If guidance overlaps, treat `AGENTS.md` as t
2. Then read this file (`CLAUDE.md`) for additional policy constraints.
3. When configuring or debugging any resource, load only the relevant learnings file — not the whole folder:
- Assistants → `docs/learnings/assistants.md`
- Tools → `docs/learnings/tools.md`
- Tools → `docs/learnings/tools.md` (also covers tool/SO dedup behavior on push)
- Squads → `docs/learnings/squads.md`
- Transfers not working → `docs/learnings/transfers.md`
- Structured outputs → `docs/learnings/structured-outputs.md`
Expand All @@ -24,9 +24,19 @@ When both files exist, follow both. If guidance overlaps, treat `AGENTS.md` as t
- Azure OpenAI BYOK → `docs/learnings/azure-openai-fallback.md`
- Multilingual agents → `docs/learnings/multilingual.md`
- WebSocket transport → `docs/learnings/websocket.md`
- Outbound calling agents → `docs/learnings/outbound-agents.md`
- Outbound Call Campaigns (CSV bulk-dial) → `docs/learnings/outbound-campaigns.md`
- Voicemail detection → `docs/learnings/voicemail-detection.md`
- Call time limits / graceful ending → `docs/learnings/call-duration.md`
- Voice provider field cheat-sheet → `docs/learnings/voice-providers.md`
- YAML authoring conventions, .vapi-ignore lifecycle → `docs/learnings/yaml-conventions.md`

This list mirrors the "Learnings & recipes" table in `AGENTS.md`. Keep both in sync — if you add a new learnings file, update both files plus `docs/learnings/README.md`.

## Where new knowledge goes

Per-resource tips/recipes/troubleshooting → `docs/learnings/<topic>.md`. Engine-friction log (push/pull/state/cleanup pain points + their fixes) → `improvements.md`. Code-level rationale → comments only when the *why* is non-obvious; never reference PR/issue numbers in code comments (they rot). One-time onboarding/install → `README.md`. When unsure, default to `docs/learnings/`. The full convention table lives in `AGENTS.md` under "Where new knowledge goes" — read it once, then this reminder is enough.

## Improvements log

This repo maintains an upstream-only running log at `improvements.md` (repo
Expand Down
2 changes: 1 addition & 1 deletion docs/learnings/assistants.md
Original file line number Diff line number Diff line change
Expand Up @@ -656,7 +656,7 @@ They are merged, not mutually exclusive. But be aware of potential duplicates.

## Liquid Variable Bag and Trust Tiers

Cross-reference: [docs.vapi.ai/assistants/dynamic-variables](https://docs.vapi.ai/assistants/dynamic-variables). The trust-tier framing came out of Mudflap progressive-auth work (PRISM-528).
Cross-reference: [docs.vapi.ai/assistants/dynamic-variables](https://docs.vapi.ai/assistants/dynamic-variables). The trust-tier framing came out of progressive caller-ID auth work on a customer rollout.

Vapi exposes a Liquid templating layer in prompts, tool config, and overrides — `{{ customer.number }}`, `{{ now }}`, etc. The variables in scope at runtime fall into three trust tiers based on where they originate. This matters because anything you place in a security-sensitive field (tool static `parameters`, message templates that go to a backend) is only as trustworthy as the source of the variable.

Expand Down
22 changes: 11 additions & 11 deletions docs/learnings/simulations.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,35 +20,35 @@ Extra system messages beyond `messages[0]` are **not** included in the tester's

When the same rubric needs to run against multiple personality variants in a sim suite, give EACH `(rubric, personality)` pair its own scenario file with a uniquely descriptive name — even if the rubric content is identical across them.

**Why:** the dashboard's run-history view displays scenarios by `name`, NOT by which personality drove the test. If 4 sims share a scenario named `iForm Live Human Pickup Handling`, all 4 result entries show identically in the suite-run sidebar — you can't tell which test was the "quick" pickup vs the "self-id" pickup vs the "question" pickup vs the "ambiguous-short" pickup without drilling into each item to see the personality. This makes failure investigation painful: every flickering test looks like the same test.
**Why:** the dashboard's run-history view displays scenarios by `name`, NOT by which personality drove the test. If 4 sims share a scenario named `Acme Logistics Live Human Pickup Handling`, all 4 result entries show identically in the suite-run sidebar — you can't tell which test was the "quick" pickup vs the "self-id" pickup vs the "question" pickup vs the "ambiguous-short" pickup without drilling into each item to see the personality. This makes failure investigation painful: every flickering test looks like the same test.

**Recommendation:** name each scenario as `<base>-<personality-variant>-handling`, with a descriptive `name:` field that calls out the personality being tested.

```yaml
# resources/<env>/simulations/scenarios/iform-live-human-pickup-quick-handling.yml
name: iForm Live Human Pickup — Quick (bare hello)
# resources/<env>/simulations/scenarios/acme-live-human-pickup-quick-handling.yml
name: Acme Logistics Live Human Pickup — Quick (bare hello)
evaluations: [...]
```

```yaml
# resources/<env>/simulations/scenarios/iform-live-human-pickup-self-id-handling.yml
name: iForm Live Human Pickup — Self-ID (driver introduces themselves)
# resources/<env>/simulations/scenarios/acme-live-human-pickup-self-id-handling.yml
name: Acme Logistics Live Human Pickup — Self-ID (driver introduces themselves)
evaluations: [...] # identical rubric content as above; only name differs
```

```yaml
# resources/<env>/simulations/scenarios/iform-live-human-pickup-question-handling.yml
name: iForm Live Human Pickup — Question (skeptical "who's calling?")
# resources/<env>/simulations/scenarios/acme-live-human-pickup-question-handling.yml
name: Acme Logistics Live Human Pickup — Question (skeptical "who's calling?")
evaluations: [...] # same
```

Each test (sim) file then references its variant-specific scenario:

```yaml
# resources/<env>/simulations/tests/iform-live-human-pickup-quick.yml
name: iForm Live Human Pickup - Quick
# resources/<env>/simulations/tests/acme-live-human-pickup-quick.yml
name: Acme Logistics Live Human Pickup - Quick
personalityId: live-human-pickup-quick-bot
scenarioId: iform-live-human-pickup-quick-handling
scenarioId: acme-live-human-pickup-quick-handling
```

**Cost:** scenario file duplication — each variant is a copy of the same rubric content with a different `name:` field. Cheap. The duplication is mechanical (you can clone the source scenario file 4-6 times with a one-line `name:` change each).
Expand All @@ -57,7 +57,7 @@ scenarioId: iform-live-human-pickup-quick-handling

**Anti-pattern:** putting one shared scenario behind N personality variants in the same suite. The dashboard sidebar shows N rows with identical scenario names, only distinguishable by clicking into each item to see the personality. Sim iteration time inflates because every failure investigation starts with "wait, which one was this?"

Cross-reference: this convention surfaced as friction during the Mudflap iForm Voicemail Triage sim iteration (PRISM-481). Original suites shipped with one shared scenario per group (4 live-pickup tests sharing one scenario, 6 voicemail-edge-cases sharing one scenario); split into per-personality scenarios mid-iteration. Worth shipping new suites in the per-personality form from day one.
Cross-reference: this convention surfaced as friction during a customer voicemail-triage sim iteration. Original suites shipped with one shared scenario per group (4 live-pickup tests sharing one scenario, 6 voicemail-edge-cases sharing one scenario); split into per-personality scenarios mid-iteration. Worth shipping new suites in the per-personality form from day one.

---

Expand Down
2 changes: 1 addition & 1 deletion docs/learnings/squads.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ For sim suites grading the destination's first-turn behavior, see [simulations.m

## Passing data between assistants

Cross-reference: [docs.vapi.ai/squads/passing-data-between-assistants](https://docs.vapi.ai/squads/passing-data-between-assistants). The trust-tier framing came out of Mudflap progressive-auth work (PRISM-528).
Cross-reference: [docs.vapi.ai/squads/passing-data-between-assistants](https://docs.vapi.ai/squads/passing-data-between-assistants). The trust-tier framing came out of progressive caller-ID auth work on a customer rollout.

When a squad hands off mid-call, three approaches exist for getting data from one assistant to the next. They differ on trust level, latency, and determinism.

Expand Down
4 changes: 4 additions & 0 deletions docs/learnings/structured-outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ evaluations.5.structuredOutput.Name must be between 1 and 40 characters

Long, descriptive evaluator names like `assistant_left_voicemail_and_ended_call_promptly` (48 chars) or `assistant_detected_hostile_recording_and_ended_call` (51 chars) will silently exceed the limit until you POST. Keep names compact (`assistant_ended_call_after_message`, `assistant_handled_hostile_recording`) and put the descriptive nuance in the `description` field, which has no length cap. The constraint applies to the field on every structured output type — both standalone resources and inline evaluations within scenarios.

### Renaming a structured-output file is safe — the engine dedups by `name`

Same dedup behavior as for tools: if you rename a structured-output file but keep its `name` field stable, the push pipeline detects the existing dashboard resource (by slugified `name` against state and the live dashboard list) and adopts its UUID instead of creating a duplicate. You'll see `🔁 Reusing existing structured output: <localKey> → <uuid>` in the push log. See [tools.md → "Renaming a tool file is safe"](tools.md#renaming-a-tool-file-is-safe--the-engine-dedups-by-functionname) for the full mechanism, ambiguity warning semantics, and `npm run cleanup` workflow — they're identical for SOs.

---

## assistant_ids Must Be UUIDs
Expand Down
26 changes: 23 additions & 3 deletions docs/learnings/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,27 @@ Vapi enforces a hard **1000-character maximum** on `function.description` across

### `function.name` matches `^[A-Za-z0-9_-]+$`

Tool names are validated against this regex by Vapi. Spaces, dots, slashes, parentheses, or unicode characters cause a 400 at push time. Use snake_case or camelCase (e.g. `end_call_vapi_testing`, `handoffToiFormSales`). The name is what the LLM emits in its function call, so keep it stable across config changes — renaming a tool invalidates any prompt rule that mentions the old name.
Tool names are validated against this regex by Vapi. Spaces, dots, slashes, parentheses, or unicode characters cause a 400 at push time. Use snake_case or camelCase (e.g. `end_call_vapi_testing`, `handoffToAcmeSales`). The name is what the LLM emits in its function call, so keep it stable across config changes — renaming a tool invalidates any prompt rule that mentions the old name.

### Renaming a tool file is safe — the engine dedups by `function.name`

The push pipeline includes a name-based dedup safety net that prevents minting duplicate dashboard tools when:

- You renamed the local file (e.g. `end-call.yml` → `intake-end-call.yml`) but kept `function.name` the same.
- Bootstrap pull stored the dashboard tool under a slug-suffixed state key (e.g. `end-call-67aea057`) and your assistant references the original local key.
- The tool exists on the dashboard but isn't yet in your local state file (e.g. fresh clone, partial pull).

In all three cases the engine looks up the tool by slugified `function.name` against both state entries and the live dashboard tool list, then **adopts** the existing UUID instead of creating a new one. You'll see this log line:

```
🔁 Reusing existing tool: <localKey> → <uuid> (matched via state|dashboard|both)
```

Adoption then routes through the standard PATCH path, so any local edits to the tool's payload are pushed normally with drift detection. Your old state-key entries are dropped automatically so the next full push doesn't orphan-delete the just-adopted dashboard tool.

**When you see `⚠️ Multiple dashboard tools share the name "<n>" — adopting <uuid> (lex-smallest)`**, real duplicate dashboard resources exist (typically from before the dedup was added). Run `npm run cleanup -- <org>` to inspect and prune; the engine adopts the lex-smallest UUID deterministically so subsequent pushes stay stable.

**What this does NOT do:** if you rename `function.name` (not just the file), that's a new logical tool — the engine creates a new dashboard resource. Function-name renames need an explicit `npm run cleanup` of the old one.

---

Expand Down Expand Up @@ -335,7 +355,7 @@ Only `function` tools support `strict` mode.

## Tool Security and Data Visibility

Cross-reference: [docs.vapi.ai/tools/static-variables-and-aliases](https://docs.vapi.ai/tools/static-variables-and-aliases) and [docs.vapi.ai/tools/custom-tools](https://docs.vapi.ai/tools/custom-tools). The full data-flow / threat-model writeup that motivates this section came out of Mudflap progressive-auth work (PRISM-528).
Cross-reference: [docs.vapi.ai/tools/static-variables-and-aliases](https://docs.vapi.ai/tools/static-variables-and-aliases) and [docs.vapi.ai/tools/custom-tools](https://docs.vapi.ai/tools/custom-tools). The full data-flow / threat-model writeup that motivates this section came out of progressive caller-ID auth work on a customer rollout.

### Every tool result is in conversation history

Expand Down Expand Up @@ -374,7 +394,7 @@ The dashboard renders these as "Parameters" (JSON schema editor) and "Static Bod
| Legacy `assistant.model.functions[]` (deprecated) | ❌ — converter zeroes it out |
| `code`, `handoff`, `transferCall`, `endCall`, others | ❌ |

#### Mudflap progressive caller-ID auth pattern (worked example)
#### Progressive caller-ID auth pattern (worked example)

```yaml
type: apiRequest
Expand Down
4 changes: 2 additions & 2 deletions docs/learnings/voice-providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ If you find yourself reaching for a provider not in the table above, append a ro

Pronunciation dictionaries do not share a field shape across voice providers. Same conceptual feature, three different surfaces.

> **Public-docs note:** As of 2026-05-08 the public Vapi docs state pronunciation dictionaries are "exclusive to ElevenLabs voices." This is out of date — Cartesia has been confirmed in production deployments and Vapi-voice schema-level support is in active rollout (PRISM-474). Treat this wiki as the more current source.
> **Public-docs note:** As of 2026-05-08 the public Vapi docs state pronunciation dictionaries are "exclusive to ElevenLabs voices." This is out of date — Cartesia has been confirmed in production deployments and Vapi-voice schema-level support is in active rollout. Treat this wiki as the more current source.

### Cartesia

Expand All @@ -120,7 +120,7 @@ Pronunciation dictionaries do not share a field shape across voice providers. Sa
### Vapi voices

- **Schema-level**: accepts pronunciation dictionary configs at the API.
- **Dashboard UI surface**: in active rollout (PRISM-474, Q2 2026). Schema acceptance does **not** guarantee runtime TTS engine honors the dictionary.
- **Dashboard UI surface**: in active rollout. Schema acceptance does **not** guarantee runtime TTS engine honors the dictionary.
- **Recommendation**: verify runtime behavior with a call test before depending on it for production Vapi-voice deployments.

### Field shape gotcha
Expand Down
Loading