diff --git a/AGENTS.md b/AGENTS.md index f07ff31..fcfc07f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -29,11 +29,23 @@ This project manages **Vapi voice agent configurations** as code. All resources | Multilingual agents (English/Spanish) | `docs/learnings/multilingual.md` | | WebSocket audio streaming | `docs/learnings/websocket.md` | | Building outbound calling agents | `docs/learnings/outbound-agents.md` | +| Bulk-dialing from a CSV (Outbound Call Campaigns) | `docs/learnings/outbound-campaigns.md` | | Voicemail detection / VM vs human classification | `docs/learnings/voicemail-detection.md` | | Enforcing call time limits / graceful call ending | `docs/learnings/call-duration.md` | | Voice provider field cheat-sheet (Cartesia vs 11labs vs OpenAI etc.) | `docs/learnings/voice-providers.md` | | YAML authoring conventions, .vapi-ignore lifecycle | `docs/learnings/yaml-conventions.md` | +**Where new knowledge goes:** + +| Kind of knowledge | Home | Convention | +|---|---|---| +| Per-resource gotchas, recipes, troubleshooting | `docs/learnings/.md` | One file per resource type or topic. Add a row to this table AND to `docs/learnings/README.md` when you add a new file. `CLAUDE.md` mirrors this list — keep both in sync. | +| Engine-friction log (push/pull/state/cleanup pain points + fixes) | `improvements.md` | Format: Problem → Current behavior → Risk → Current mitigation → Possible fix → Status. Mark `[RESOLVED YYYY-MM-DD] (#)` when fixed; never delete. | +| Code-level rationale (why a function works the way it does) | Code comments | Only when the WHY is non-obvious — not what the code does. Don't reference PR/issue numbers; they rot. | +| Setup, install, repo orientation | `README.md` | One-time onboarding only. Don't put runtime gotchas here. | + +If you're unsure where something goes, default to `docs/learnings/`. The README and engine-friction log are deliberately narrow. + --- ## Quick Reference @@ -65,7 +77,7 @@ docs/ ├── changelog.md # Template for tracking per-customer config changes └── learnings/ # Gotchas, recipes, and troubleshooting ├── README.md # Task-routed index — start here - ├── tools.md # Tool configuration gotchas + ├── tools.md # Tool configuration gotchas (incl. dedup behavior) ├── assistants.md # Assistant configuration gotchas ├── squads.md # Squad and multi-agent gotchas ├── structured-outputs.md # Structured output gotchas + KPI patterns @@ -78,7 +90,11 @@ docs/ ├── multilingual.md # Multilingual agent architecture guide ├── websocket.md # WebSocket transport rules ├── outbound-agents.md # Outbound agent design & IVR navigation - └── voicemail-detection.md # Voicemail vs human classification + ├── outbound-campaigns.md # Bulk-dial CSV campaigns + dynamic variables + ├── voicemail-detection.md # Voicemail vs human classification + ├── call-duration.md # Call time limits and graceful end-of-call + ├── voice-providers.md # Per-provider voice block field cheat-sheet + └── yaml-conventions.md # YAML authoring conventions, .vapi-ignore lifecycle resources/ ├── / # Org-scoped resources (npm run push -- reads here) diff --git a/CLAUDE.md b/CLAUDE.md index ad9c294..57c2998 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -13,7 +13,7 @@ When both files exist, follow both. If guidance overlaps, treat `AGENTS.md` as t 2. Then read this file (`CLAUDE.md`) for additional policy constraints. 3. When configuring or debugging any resource, load only the relevant learnings file — not the whole folder: - Assistants → `docs/learnings/assistants.md` - - Tools → `docs/learnings/tools.md` + - Tools → `docs/learnings/tools.md` (also covers tool/SO dedup behavior on push) - Squads → `docs/learnings/squads.md` - Transfers not working → `docs/learnings/transfers.md` - Structured outputs → `docs/learnings/structured-outputs.md` @@ -24,9 +24,19 @@ When both files exist, follow both. If guidance overlaps, treat `AGENTS.md` as t - Azure OpenAI BYOK → `docs/learnings/azure-openai-fallback.md` - Multilingual agents → `docs/learnings/multilingual.md` - WebSocket transport → `docs/learnings/websocket.md` + - Outbound calling agents → `docs/learnings/outbound-agents.md` + - Outbound Call Campaigns (CSV bulk-dial) → `docs/learnings/outbound-campaigns.md` + - Voicemail detection → `docs/learnings/voicemail-detection.md` - Call time limits / graceful ending → `docs/learnings/call-duration.md` + - Voice provider field cheat-sheet → `docs/learnings/voice-providers.md` - YAML authoring conventions, .vapi-ignore lifecycle → `docs/learnings/yaml-conventions.md` +This list mirrors the "Learnings & recipes" table in `AGENTS.md`. Keep both in sync — if you add a new learnings file, update both files plus `docs/learnings/README.md`. + +## Where new knowledge goes + +Per-resource tips/recipes/troubleshooting → `docs/learnings/.md`. Engine-friction log (push/pull/state/cleanup pain points + their fixes) → `improvements.md`. Code-level rationale → comments only when the *why* is non-obvious; never reference PR/issue numbers in code comments (they rot). One-time onboarding/install → `README.md`. When unsure, default to `docs/learnings/`. The full convention table lives in `AGENTS.md` under "Where new knowledge goes" — read it once, then this reminder is enough. + ## Improvements log This repo maintains an upstream-only running log at `improvements.md` (repo diff --git a/docs/learnings/assistants.md b/docs/learnings/assistants.md index d529a5f..e32a98f 100644 --- a/docs/learnings/assistants.md +++ b/docs/learnings/assistants.md @@ -656,7 +656,7 @@ They are merged, not mutually exclusive. But be aware of potential duplicates. ## Liquid Variable Bag and Trust Tiers -Cross-reference: [docs.vapi.ai/assistants/dynamic-variables](https://docs.vapi.ai/assistants/dynamic-variables). The trust-tier framing came out of Mudflap progressive-auth work (PRISM-528). +Cross-reference: [docs.vapi.ai/assistants/dynamic-variables](https://docs.vapi.ai/assistants/dynamic-variables). The trust-tier framing came out of progressive caller-ID auth work on a customer rollout. Vapi exposes a Liquid templating layer in prompts, tool config, and overrides — `{{ customer.number }}`, `{{ now }}`, etc. The variables in scope at runtime fall into three trust tiers based on where they originate. This matters because anything you place in a security-sensitive field (tool static `parameters`, message templates that go to a backend) is only as trustworthy as the source of the variable. diff --git a/docs/learnings/simulations.md b/docs/learnings/simulations.md index d581559..a998a5f 100644 --- a/docs/learnings/simulations.md +++ b/docs/learnings/simulations.md @@ -20,35 +20,35 @@ Extra system messages beyond `messages[0]` are **not** included in the tester's When the same rubric needs to run against multiple personality variants in a sim suite, give EACH `(rubric, personality)` pair its own scenario file with a uniquely descriptive name — even if the rubric content is identical across them. -**Why:** the dashboard's run-history view displays scenarios by `name`, NOT by which personality drove the test. If 4 sims share a scenario named `iForm Live Human Pickup Handling`, all 4 result entries show identically in the suite-run sidebar — you can't tell which test was the "quick" pickup vs the "self-id" pickup vs the "question" pickup vs the "ambiguous-short" pickup without drilling into each item to see the personality. This makes failure investigation painful: every flickering test looks like the same test. +**Why:** the dashboard's run-history view displays scenarios by `name`, NOT by which personality drove the test. If 4 sims share a scenario named `Acme Logistics Live Human Pickup Handling`, all 4 result entries show identically in the suite-run sidebar — you can't tell which test was the "quick" pickup vs the "self-id" pickup vs the "question" pickup vs the "ambiguous-short" pickup without drilling into each item to see the personality. This makes failure investigation painful: every flickering test looks like the same test. **Recommendation:** name each scenario as `--handling`, with a descriptive `name:` field that calls out the personality being tested. ```yaml -# resources//simulations/scenarios/iform-live-human-pickup-quick-handling.yml -name: iForm Live Human Pickup — Quick (bare hello) +# resources//simulations/scenarios/acme-live-human-pickup-quick-handling.yml +name: Acme Logistics Live Human Pickup — Quick (bare hello) evaluations: [...] ``` ```yaml -# resources//simulations/scenarios/iform-live-human-pickup-self-id-handling.yml -name: iForm Live Human Pickup — Self-ID (driver introduces themselves) +# resources//simulations/scenarios/acme-live-human-pickup-self-id-handling.yml +name: Acme Logistics Live Human Pickup — Self-ID (driver introduces themselves) evaluations: [...] # identical rubric content as above; only name differs ``` ```yaml -# resources//simulations/scenarios/iform-live-human-pickup-question-handling.yml -name: iForm Live Human Pickup — Question (skeptical "who's calling?") +# resources//simulations/scenarios/acme-live-human-pickup-question-handling.yml +name: Acme Logistics Live Human Pickup — Question (skeptical "who's calling?") evaluations: [...] # same ``` Each test (sim) file then references its variant-specific scenario: ```yaml -# resources//simulations/tests/iform-live-human-pickup-quick.yml -name: iForm Live Human Pickup - Quick +# resources//simulations/tests/acme-live-human-pickup-quick.yml +name: Acme Logistics Live Human Pickup - Quick personalityId: live-human-pickup-quick-bot -scenarioId: iform-live-human-pickup-quick-handling +scenarioId: acme-live-human-pickup-quick-handling ``` **Cost:** scenario file duplication — each variant is a copy of the same rubric content with a different `name:` field. Cheap. The duplication is mechanical (you can clone the source scenario file 4-6 times with a one-line `name:` change each). @@ -57,7 +57,7 @@ scenarioId: iform-live-human-pickup-quick-handling **Anti-pattern:** putting one shared scenario behind N personality variants in the same suite. The dashboard sidebar shows N rows with identical scenario names, only distinguishable by clicking into each item to see the personality. Sim iteration time inflates because every failure investigation starts with "wait, which one was this?" -Cross-reference: this convention surfaced as friction during the Mudflap iForm Voicemail Triage sim iteration (PRISM-481). Original suites shipped with one shared scenario per group (4 live-pickup tests sharing one scenario, 6 voicemail-edge-cases sharing one scenario); split into per-personality scenarios mid-iteration. Worth shipping new suites in the per-personality form from day one. +Cross-reference: this convention surfaced as friction during a customer voicemail-triage sim iteration. Original suites shipped with one shared scenario per group (4 live-pickup tests sharing one scenario, 6 voicemail-edge-cases sharing one scenario); split into per-personality scenarios mid-iteration. Worth shipping new suites in the per-personality form from day one. --- diff --git a/docs/learnings/squads.md b/docs/learnings/squads.md index bb0dc7a..52b10c7 100644 --- a/docs/learnings/squads.md +++ b/docs/learnings/squads.md @@ -107,7 +107,7 @@ For sim suites grading the destination's first-turn behavior, see [simulations.m ## Passing data between assistants -Cross-reference: [docs.vapi.ai/squads/passing-data-between-assistants](https://docs.vapi.ai/squads/passing-data-between-assistants). The trust-tier framing came out of Mudflap progressive-auth work (PRISM-528). +Cross-reference: [docs.vapi.ai/squads/passing-data-between-assistants](https://docs.vapi.ai/squads/passing-data-between-assistants). The trust-tier framing came out of progressive caller-ID auth work on a customer rollout. When a squad hands off mid-call, three approaches exist for getting data from one assistant to the next. They differ on trust level, latency, and determinism. diff --git a/docs/learnings/structured-outputs.md b/docs/learnings/structured-outputs.md index 2dfd1ef..3051761 100644 --- a/docs/learnings/structured-outputs.md +++ b/docs/learnings/structured-outputs.md @@ -32,6 +32,10 @@ evaluations.5.structuredOutput.Name must be between 1 and 40 characters Long, descriptive evaluator names like `assistant_left_voicemail_and_ended_call_promptly` (48 chars) or `assistant_detected_hostile_recording_and_ended_call` (51 chars) will silently exceed the limit until you POST. Keep names compact (`assistant_ended_call_after_message`, `assistant_handled_hostile_recording`) and put the descriptive nuance in the `description` field, which has no length cap. The constraint applies to the field on every structured output type — both standalone resources and inline evaluations within scenarios. +### Renaming a structured-output file is safe — the engine dedups by `name` + +Same dedup behavior as for tools: if you rename a structured-output file but keep its `name` field stable, the push pipeline detects the existing dashboard resource (by slugified `name` against state and the live dashboard list) and adopts its UUID instead of creating a duplicate. You'll see `🔁 Reusing existing structured output: ` in the push log. See [tools.md → "Renaming a tool file is safe"](tools.md#renaming-a-tool-file-is-safe--the-engine-dedups-by-functionname) for the full mechanism, ambiguity warning semantics, and `npm run cleanup` workflow — they're identical for SOs. + --- ## assistant_ids Must Be UUIDs diff --git a/docs/learnings/tools.md b/docs/learnings/tools.md index 3386a77..b670e54 100644 --- a/docs/learnings/tools.md +++ b/docs/learnings/tools.md @@ -37,7 +37,27 @@ Vapi enforces a hard **1000-character maximum** on `function.description` across ### `function.name` matches `^[A-Za-z0-9_-]+$` -Tool names are validated against this regex by Vapi. Spaces, dots, slashes, parentheses, or unicode characters cause a 400 at push time. Use snake_case or camelCase (e.g. `end_call_vapi_testing`, `handoffToiFormSales`). The name is what the LLM emits in its function call, so keep it stable across config changes — renaming a tool invalidates any prompt rule that mentions the old name. +Tool names are validated against this regex by Vapi. Spaces, dots, slashes, parentheses, or unicode characters cause a 400 at push time. Use snake_case or camelCase (e.g. `end_call_vapi_testing`, `handoffToAcmeSales`). The name is what the LLM emits in its function call, so keep it stable across config changes — renaming a tool invalidates any prompt rule that mentions the old name. + +### Renaming a tool file is safe — the engine dedups by `function.name` + +The push pipeline includes a name-based dedup safety net that prevents minting duplicate dashboard tools when: + +- You renamed the local file (e.g. `end-call.yml` → `intake-end-call.yml`) but kept `function.name` the same. +- Bootstrap pull stored the dashboard tool under a slug-suffixed state key (e.g. `end-call-67aea057`) and your assistant references the original local key. +- The tool exists on the dashboard but isn't yet in your local state file (e.g. fresh clone, partial pull). + +In all three cases the engine looks up the tool by slugified `function.name` against both state entries and the live dashboard tool list, then **adopts** the existing UUID instead of creating a new one. You'll see this log line: + +``` +🔁 Reusing existing tool: (matched via state|dashboard|both) +``` + +Adoption then routes through the standard PATCH path, so any local edits to the tool's payload are pushed normally with drift detection. Your old state-key entries are dropped automatically so the next full push doesn't orphan-delete the just-adopted dashboard tool. + +**When you see `⚠️ Multiple dashboard tools share the name "" — adopting (lex-smallest)`**, real duplicate dashboard resources exist (typically from before the dedup was added). Run `npm run cleanup -- ` to inspect and prune; the engine adopts the lex-smallest UUID deterministically so subsequent pushes stay stable. + +**What this does NOT do:** if you rename `function.name` (not just the file), that's a new logical tool — the engine creates a new dashboard resource. Function-name renames need an explicit `npm run cleanup` of the old one. --- @@ -335,7 +355,7 @@ Only `function` tools support `strict` mode. ## Tool Security and Data Visibility -Cross-reference: [docs.vapi.ai/tools/static-variables-and-aliases](https://docs.vapi.ai/tools/static-variables-and-aliases) and [docs.vapi.ai/tools/custom-tools](https://docs.vapi.ai/tools/custom-tools). The full data-flow / threat-model writeup that motivates this section came out of Mudflap progressive-auth work (PRISM-528). +Cross-reference: [docs.vapi.ai/tools/static-variables-and-aliases](https://docs.vapi.ai/tools/static-variables-and-aliases) and [docs.vapi.ai/tools/custom-tools](https://docs.vapi.ai/tools/custom-tools). The full data-flow / threat-model writeup that motivates this section came out of progressive caller-ID auth work on a customer rollout. ### Every tool result is in conversation history @@ -374,7 +394,7 @@ The dashboard renders these as "Parameters" (JSON schema editor) and "Static Bod | Legacy `assistant.model.functions[]` (deprecated) | ❌ — converter zeroes it out | | `code`, `handoff`, `transferCall`, `endCall`, others | ❌ | -#### Mudflap progressive caller-ID auth pattern (worked example) +#### Progressive caller-ID auth pattern (worked example) ```yaml type: apiRequest diff --git a/docs/learnings/voice-providers.md b/docs/learnings/voice-providers.md index fae89bc..e56cbca 100644 --- a/docs/learnings/voice-providers.md +++ b/docs/learnings/voice-providers.md @@ -102,7 +102,7 @@ If you find yourself reaching for a provider not in the table above, append a ro Pronunciation dictionaries do not share a field shape across voice providers. Same conceptual feature, three different surfaces. -> **Public-docs note:** As of 2026-05-08 the public Vapi docs state pronunciation dictionaries are "exclusive to ElevenLabs voices." This is out of date — Cartesia has been confirmed in production deployments and Vapi-voice schema-level support is in active rollout (PRISM-474). Treat this wiki as the more current source. +> **Public-docs note:** As of 2026-05-08 the public Vapi docs state pronunciation dictionaries are "exclusive to ElevenLabs voices." This is out of date — Cartesia has been confirmed in production deployments and Vapi-voice schema-level support is in active rollout. Treat this wiki as the more current source. ### Cartesia @@ -120,7 +120,7 @@ Pronunciation dictionaries do not share a field shape across voice providers. Sa ### Vapi voices - **Schema-level**: accepts pronunciation dictionary configs at the API. -- **Dashboard UI surface**: in active rollout (PRISM-474, Q2 2026). Schema acceptance does **not** guarantee runtime TTS engine honors the dictionary. +- **Dashboard UI surface**: in active rollout. Schema acceptance does **not** guarantee runtime TTS engine honors the dictionary. - **Recommendation**: verify runtime behavior with a call test before depending on it for production Vapi-voice deployments. ### Field shape gotcha diff --git a/improvements.md b/improvements.md index c9f078e..6ff0dfd 100644 --- a/improvements.md +++ b/improvements.md @@ -61,7 +61,7 @@ you which stack PR closes the row.** | 7 | Voice edits drop pronunciation-dictionary attachments | Silent regression on Cartesia + 11labs voice edits | #4 | RESOLVED 2026-04-30 (Stack G) | | 8 | Dashboard prompt edits can in-place duplicate the prompt | Two stacked prompt versions = stitched output | None | Partial — Stack D heuristic | | 9 | Provider-specific voice schema mismatch (push 400) | `voice.speed` vs `voice.generationConfig.speed` | None | RESOLVED 2026-04-30 (Stack D + A) | -| 10 | Targeted assistant push mints duplicate tools | Re-pushing assistant duplicates `end-call-*` tools | #4 | Partial | +| 10 | Targeted assistant push mints duplicate tools | Re-pushing assistant duplicates `end-call-*` tools | #4 | RESOLVED 2026-05-06 (#23) | | 11 | Bidirectional SO ↔ assistant lockstep has no validation | One-sided edits silently inconsistent | None | RESOLVED 2026-04-30 (Stack D) | | 12 | State file accumulates UUIDs without source files | Silent gitops drift | None | Partial | | 13 | `.agent/` and `.claude/handoffs/` not gitignored | `git add -A` sweeps PII handoff scratch | None | RESOLVED 2026-04-30 (Stack A) | @@ -533,6 +533,13 @@ cheat-sheet in `docs/learnings/voice-providers.md` (Stack A). ## 10. Targeted assistant pushes can auto-create duplicate tool dependencies +**[RESOLVED 2026-05-06] (#23)** — `ensureToolExists` / +`ensureStructuredOutputExists` now run a name-based dedup check (state-side +via `extractBaseSlug` + dashboard-side via lazy `/tool` list) between the +exact-key short-circuit and the create path. Adoption re-keys state to the +canonical UUID, drops stale duplicate state keys (orphan-deletion guard), +and routes through `applyTool` for the standard PATCH + drift-check flow. + **Discovered:** customer-fork log (Amazon3p #10, 2026-04-29) ### Problem @@ -578,9 +585,15 @@ Before re-pushing an assistant with local tool dependencies, inspect ### Status -**Partial.** `ensureToolExists()` blocks the most common path; the -state-renaming case remains. **Stack C dry-run** surfaces auto-apply -intent before mutation. +**Resolved 2026-05-06 (#23).** Name-based dedup check (state-side + +dashboard-side) added between the exact-key short-circuit and the create +path. Adoption re-keys state to the canonical UUID, removes stale duplicate +state keys (touched-marked so `mergeScoped` flushes the deletion), and +routes through `applyTool` so a PATCH + drift check fires with the local +payload (no fake `lastPushedHash` recorded). Dashboard-side dedup honors +dry-run by skipping the API call. Prior partial mitigation +(`ensureToolExists` exact-key check) remains as the fast path; the new +dedup is the second layer for the bootstrap-renamed case. --- diff --git a/src/api.ts b/src/api.ts index a6d81dc..3ac65f9 100644 --- a/src/api.ts +++ b/src/api.ts @@ -8,8 +8,8 @@ import type { VapiResponse } from "./types.ts"; // counted instead of executed. The end-of-run summary in push.ts reads // `getDryRunCounts()` to print "would create N, would update M, would delete K." // -// GETs always run — drift detection (Stack G) and dry-run preview both need -// to fetch current platform state. +// GETs always run — drift detection and dry-run preview both need to fetch +// current platform state. // ───────────────────────────────────────────────────────────────────────────── const DRY_RUN_COUNTS = { POST: 0, PATCH: 0, DELETE: 0 }; diff --git a/src/cleanup.ts b/src/cleanup.ts index 5b2d5ae..321c0b8 100644 --- a/src/cleanup.ts +++ b/src/cleanup.ts @@ -111,8 +111,8 @@ async function main(): Promise { } const state = loadState(); - // Stack F: state values are ResourceState objects, not bare UUIDs. Extract - // each .uuid for the orphan-detection set. + // State values are ResourceState objects, not bare UUIDs. Extract each + // .uuid for the orphan-detection set. const stateIds = new Set([ ...Object.values(state.assistants).map((e) => e.uuid), ...Object.values(state.tools).map((e) => e.uuid), diff --git a/src/dep-dedup.ts b/src/dep-dedup.ts new file mode 100644 index 0000000..bed096c --- /dev/null +++ b/src/dep-dedup.ts @@ -0,0 +1,151 @@ +// Dependency deduplication helpers for the push pipeline. +// +// On a targeted assistant push, `ensureToolExists` / `ensureStructuredOutputExists` +// previously skipped auto-create only when (a) the dep id was UUID-shaped, +// (b) `state.tools[depId]` was an exact key match, or (c) we'd already +// auto-applied this id in the current run. Bootstrap pull stores resources +// under `-` keys (e.g. `end-call-67aea057`), so a local file +// referencing `my-end-call` would miss the exact-key check and POST a +// duplicate dashboard tool. Repeated targeted pushes accumulated orphans +// on the dashboard. +// +// This module's helpers detect those collisions BEFORE create: +// - `findExistingResourceByName` — match local payload's canonical name +// against existing state entries (renamed/state-only keys) and the live +// dashboard list. Returns the UUID to adopt, plus an `ambiguous` flag +// when multiple distinct UUIDs share the same name (real on-dashboard +// duplicates from prior bug runs — the caller should warn and surface +// the loser UUIDs so a follow-up `npm run cleanup` can prune them). +// +// NOTE on duplication: `slugify` and `extractBaseSlug` here mirror the +// definitions in `src/pull.ts`. pull.ts imports config.ts, which calls +// `parseEnvironment()` at module load and `process.exit(1)`s without a +// CLI env arg — making it impossible to import in a unit test. This +// module imports ONLY from `./types.ts` so it stays testable in +// isolation. Five lines duplicated is the right tradeoff; do not "DRY" +// these back into pull.ts. + +import type { ResourceState } from "./types.ts"; + +export interface RemoteResource { + id: string; + name?: string; + function?: { name?: string }; +} + +export interface DedupMatch { + uuid: string; + source: "state" | "dashboard" | "both"; + ambiguous: boolean; + // Other distinct UUIDs we saw under the same canonical name. Empty when + // `ambiguous` is false. Caller should surface these in a warning so the + // user can run `npm run cleanup` to prune the duplicates. + duplicateUuids: string[]; +} + +export function slugify(name: string): string { + return name + .toLowerCase() + .replace(/[^a-z0-9]+/g, "-") + .replace(/^-+|-+$/g, "") + .replace(/-+/g, "-"); +} + +export function extractBaseSlug(resourceId: string): string { + const match = resourceId.match(/^(.*)-([a-f0-9]{8})$/i); + return match?.[1] ?? resourceId; +} + +// Minimal payload shape this module needs. Local resource files are loaded +// as `Record`, so the only fields we know exist are `name` +// (top-level, used by SOs / assistants / squads) and a nested `function.name` +// (used by tools). Everything else stays opaque — we narrow at use. +export type NameablePayload = { name?: unknown; function?: unknown }; + +// Pulls the canonical name from a tool / SO / assistant payload. +// For tools: `function.name` is the canonical name. +// For SOs / assistants / etc.: top-level `name`. Top-level wins when both +// are present. +export function extractResourceName( + payload: NameablePayload, +): string | undefined { + if (typeof payload.name === "string" && payload.name) return payload.name; + const fn = payload.function; + if ( + fn !== null && + typeof fn === "object" && + "name" in fn && + typeof fn.name === "string" && + fn.name + ) { + return fn.name; + } + return undefined; +} + +// Find an existing resource (in state or dashboard) whose canonical name +// matches the local payload's canonical name. Used by ensureToolExists / +// ensureStructuredOutputExists to avoid creating a duplicate when bootstrap +// pull has already stored the same dashboard resource under a different +// state key. +// +// `localResourceId` is the key the engine WANTS to use in state. If a state +// entry already exists under that exact key, the caller short-circuits BEFORE +// calling this — so we exclude it here as a safety belt. +// +// Tiebreaker for >1 distinct UUID matching the same slugified name (i.e. +// real on-dashboard duplicates from prior bug runs): pick lexically smallest +// UUID for stable, deterministic adoption. Set `ambiguous=true` and surface +// the loser UUIDs in `duplicateUuids` so the caller can warn. +export function findExistingResourceByName(args: { + localResourceId: string; + localPayload: NameablePayload; + stateSection: Record; + remoteList?: RemoteResource[]; +}): DedupMatch | undefined { + const localName = extractResourceName(args.localPayload); + if (!localName) return undefined; + const localSlug = slugify(localName); + + // uuid -> set of source labels (state:, dashboard:) + const matches = new Map>(); + + for (const [stateKey, entry] of Object.entries(args.stateSection)) { + if (stateKey === args.localResourceId) continue; + if (extractBaseSlug(stateKey) === localSlug) { + const set = matches.get(entry.uuid) ?? new Set(); + set.add(`state:${stateKey}`); + matches.set(entry.uuid, set); + } + } + + for (const remote of args.remoteList ?? []) { + const remoteName = + (typeof remote.name === "string" && remote.name) || remote.function?.name; + if (!remoteName) continue; + if (slugify(remoteName) === localSlug) { + const set = matches.get(remote.id) ?? new Set(); + set.add(`dashboard:${remote.id}`); + matches.set(remote.id, set); + } + } + + if (matches.size === 0) return undefined; + + const sorted = [...matches.keys()].sort(); + const winner = sorted[0]!; + const winnerSources = matches.get(winner)!; + const hasState = [...winnerSources].some((s) => s.startsWith("state:")); + const hasDashboard = [...winnerSources].some((s) => + s.startsWith("dashboard:"), + ); + const source: DedupMatch["source"] = + hasState && hasDashboard ? "both" : hasState ? "state" : "dashboard"; + + return { + uuid: winner, + source, + ambiguous: matches.size > 1, + duplicateUuids: sorted.slice(1), + }; +} diff --git a/src/drift.ts b/src/drift.ts index 53f88e5..b5a3026 100644 --- a/src/drift.ts +++ b/src/drift.ts @@ -1,14 +1,14 @@ // ───────────────────────────────────────────────────────────────────────────── -// Drift detection — Stack G +// Drift detection // // Before each PATCH, GET the current platform payload, hash it, and compare // to the `lastPulledHash` recorded in state. If the hashes differ, the // dashboard has drifted away from the version we last pulled — refuse to -// push without `--overwrite` (improvements.md #1, #2, #7). +// push without `--overwrite`. // // Behavior matrix: -// - No `lastPulledHash` (e.g., legacy state, first push after Stack F): -// log "drift unknown — proceeding" and continue. Don't block. +// - No `lastPulledHash` (e.g., legacy state, first push after schema +// migration): log "drift unknown — proceeding" and continue. Don't block. // - Hashes match: continue silently. // - Hashes differ + no --overwrite: refuse the push, return false. // - Hashes differ + --overwrite: log "overwriting drift" and continue. diff --git a/src/pull.ts b/src/pull.ts index 6af7c5c..786d269 100644 --- a/src/pull.ts +++ b/src/pull.ts @@ -830,10 +830,10 @@ export async function pullResourceType( if (isNew) created++; else updated++; - // Update state with new content hash + timestamp (Stack F). + // Update state with new content hash + timestamp. // Hashing the resolved-with-credentials payload (the form we will save // to disk) keeps `lastPulledHash` aligned with the source-of-truth diff - // basis used by drift detection in Stack G. + // basis used by drift detection. upsertState(newStateSection, resourceId, { uuid: resource.id, lastPulledHash: hashPayload(withCredNames), diff --git a/src/push.ts b/src/push.ts index d98ba26..57392b3 100644 --- a/src/push.ts +++ b/src/push.ts @@ -16,9 +16,14 @@ import { summarizeFindings, validateResources } from "./validate.ts"; import { checkDriftForUpdate } from "./drift.ts"; import { writeSnapshot } from "./snapshot.ts"; import { mergeScoped } from "./state-merge.ts"; +import { + extractResourceName, + findExistingResourceByName, + type RemoteResource, +} from "./dep-dedup.ts"; -// Map a resource label to its state-file key. Used for snapshotting (Stack H) -// — snapshot directories are keyed by the same names the state file uses. +// Map a resource label to its state-file key. Used for snapshotting — +// snapshot directories are keyed by the same names the state file uses. const RESOURCE_LABEL_TO_TYPE: Record = { tool: "tools", "structured output": "structuredOutputs", @@ -99,12 +104,12 @@ async function upsertResourceWithStateRecovery(options: { ` 🔄 Updating ${resourceLabel}: ${resourceId} (${existingUuid})`, ); - // Stack G — drift detection. Before PATCH, GET the current platform - // payload, hash it, and compare to lastPulledHash. Refuse to overwrite - // without --overwrite. Skipped in dry-run because the operator just - // wants to see what would happen, and skipped if no baseline hash. - // Stack H — when we successfully fetch the platform payload, snapshot - // it (and our outgoing payload) so `npm run rollback` has a target. + // Drift detection. Before PATCH, GET the current platform payload, hash + // it, and compare to lastPulledHash. Refuse to overwrite without + // --overwrite. Skipped in dry-run because the operator just wants to see + // what would happen, and skipped if no baseline hash. + // When we successfully fetch the platform payload, snapshot it (and our + // outgoing payload) so `npm run rollback` has a target. if (!DRY_RUN) { const stateEntry = stateSection[resourceId]; if (stateEntry) { @@ -778,11 +783,12 @@ function filterResourcesByPaths( return resources.filter((r) => matchingIds.has(r.resourceId)); } -// Stack J — track which resourceIds were actually written during this apply. -// On scoped push, the end-of-run save merges only these entries back into -// the on-disk state, leaving untouched entries alone. Without this, a scoped -// push (`npm run push -- assistants/foo.md`) sweeps in any pre-existing -// drift across the entire state file (improvements.md #15). +// Track which resourceIds were actually written during this apply. On +// scoped push, the end-of-run save merges only these entries back into +// the on-disk state, leaving untouched entries alone. Without this, a +// scoped push (`npm run push -- assistants/foo.md`) would sweep +// pre-existing drift across the entire state file into the commit-able +// diff. interface TouchedSets { tools: Set; structuredOutputs: Set; @@ -827,6 +833,96 @@ interface DependencyContext { autoApplied: Set; autoAppliedTools: ResourceFile>[]; autoAppliedStructuredOutputs: ResourceFile>[]; + // Lazy-fetched dashboard inventories — populated at most once per push, + // only when an auto-apply path needs to verify an existing dashboard + // resource isn't being shadowed by a renamed/state-only key. + // `undefined` = not yet fetched; `[]` = fetched but dashboard returned 0 + // (or the fetch failed and we degraded to state-only dedup). + // + // Simulations / personalities / scenarios / simulation suites are not + // listed here because they're not auto-applied as dependencies anywhere + // in the engine (they're top-level resources only). If we ever add a + // dependency-resolution path for them, mirror this pattern. + existingRemoteTools?: RemoteResource[]; + existingRemoteStructuredOutputs?: RemoteResource[]; + existingRemoteAssistants?: RemoteResource[]; + // Track which resourceIds we mutated so the scoped state-merge on save + // can flush only the touched section, leaving untouched on-disk state + // alone. Required for adoption: without it, a scoped push would lose + // the adopted UUID at end-of-run save. + touched: TouchedSets; +} + +// Lazy-fetch the live dashboard inventory for a given resource type. Used by +// the auto-apply path to detect existing dashboard resources before minting +// a duplicate POST. Honors dry-run by skipping the API call entirely +// (empty list → state-only dedup). +async function getExistingRemoteTools( + ctx: DependencyContext, +): Promise { + if (ctx.existingRemoteTools !== undefined) return ctx.existingRemoteTools; + if (DRY_RUN) { + ctx.existingRemoteTools = []; + return ctx.existingRemoteTools; + } + try { + const remote = await fetchAllResources("tools"); + ctx.existingRemoteTools = remote as unknown as RemoteResource[]; + } catch (err) { + console.warn( + ` ⚠️ Could not fetch dashboard tools for dedup check: ${ + err instanceof Error ? err.message : String(err) + }. Falling back to state-only dedup.`, + ); + ctx.existingRemoteTools = []; + } + return ctx.existingRemoteTools; +} + +async function getExistingRemoteStructuredOutputs( + ctx: DependencyContext, +): Promise { + if (ctx.existingRemoteStructuredOutputs !== undefined) + return ctx.existingRemoteStructuredOutputs; + if (DRY_RUN) { + ctx.existingRemoteStructuredOutputs = []; + return ctx.existingRemoteStructuredOutputs; + } + try { + const remote = await fetchAllResources("structuredOutputs"); + ctx.existingRemoteStructuredOutputs = remote as unknown as RemoteResource[]; + } catch (err) { + console.warn( + ` ⚠️ Could not fetch dashboard structured outputs for dedup check: ${ + err instanceof Error ? err.message : String(err) + }. Falling back to state-only dedup.`, + ); + ctx.existingRemoteStructuredOutputs = []; + } + return ctx.existingRemoteStructuredOutputs; +} + +async function getExistingRemoteAssistants( + ctx: DependencyContext, +): Promise { + if (ctx.existingRemoteAssistants !== undefined) + return ctx.existingRemoteAssistants; + if (DRY_RUN) { + ctx.existingRemoteAssistants = []; + return ctx.existingRemoteAssistants; + } + try { + const remote = await fetchAllResources("assistants"); + ctx.existingRemoteAssistants = remote as unknown as RemoteResource[]; + } catch (err) { + console.warn( + ` ⚠️ Could not fetch dashboard assistants for dedup check: ${ + err instanceof Error ? err.message : String(err) + }. Falling back to state-only dedup.`, + ); + ctx.existingRemoteAssistants = []; + } + return ctx.existingRemoteAssistants; } async function ensureToolExists( @@ -843,6 +939,69 @@ async function ensureToolExists( const tool = ctx.allTools.find((t) => t.resourceId === toolId); if (!tool) return; + // Before creating, check whether an existing state entry (under a + // different key — e.g., bootstrap-generated `end-call-`) or a + // live dashboard tool already represents this same logical tool. Adopt + // instead of minting a duplicate. + const remoteList = await getExistingRemoteTools(ctx); + const match = findExistingResourceByName({ + localResourceId: toolId, + localPayload: tool.data, + stateSection: ctx.state.tools, + remoteList, + }); + if (match) { + if (match.ambiguous) { + const displayName = extractResourceName(tool.data) ?? toolId; + console.warn( + ` ⚠️ Multiple dashboard tools share the name "${displayName}" — adopting ${match.uuid} (lex-smallest). Other UUIDs: ${match.duplicateUuids.join(", ")}. Run \`npm run cleanup -- ${VAPI_ENV}\` to prune duplicates.`, + ); + } + console.log( + ` 🔁 Reusing existing tool: ${toolId} → ${match.uuid} (matched via ${match.source})`, + ); + + // Re-key state to point at the adopted UUID under the local resourceId. + // No hash yet — applyTool below will PATCH with the local payload and + // record the post-PATCH hash, exercising the standard drift-check flow. + upsertState(ctx.state.tools, tool.resourceId, { uuid: match.uuid }); + + // Orphan-deletion guard — drop other state keys pointing at the SAME + // uuid so a subsequent full push doesn't see them as "tracked but no + // local file" and DELETE the dashboard resource we just adopted. Mark + // them touched so the scoped state-merge on save flushes the deletion. + // Entries pointing at `match.duplicateUuids` are SEPARATE dashboard + // duplicates — leave them alone; `npm run cleanup` handles those. + for (const [staleKey, entry] of Object.entries(ctx.state.tools)) { + if (staleKey !== tool.resourceId && entry.uuid === match.uuid) { + delete ctx.state.tools[staleKey]; + ctx.touched.tools.add(staleKey); + } + } + + // PATCH the dashboard with the local payload. `applyTool`'s + // `upsertResourceWithStateRecovery` branch picks PATCH because + // `state.tools` now has `existingUuid` set. Drift check fires + // (no-baseline → log + proceed when `lastPulledHash` is undefined; + // full check when it isn't). + try { + const uuid = await applyTool(tool, ctx.state); + ctx.autoApplied.add(`tools:${toolId}`); + if (!uuid) return; + upsertState(ctx.state.tools, tool.resourceId, { + uuid, + lastPushedHash: hashPayload(tool.data), + }); + ctx.applied.tools++; + ctx.autoAppliedTools.push(tool); + ctx.touched.tools.add(tool.resourceId); + } catch (error) { + console.error(formatApiError(toolId, error)); + throw error; + } + return; + } + console.log(` 📦 Auto-applying dependency → tool: ${toolId}`); try { const uuid = await applyTool(tool, ctx.state); @@ -854,6 +1013,7 @@ async function ensureToolExists( }); ctx.applied.tools++; ctx.autoAppliedTools.push(tool); + ctx.touched.tools.add(tool.resourceId); } catch (error) { console.error(formatApiError(toolId, error)); throw error; @@ -876,6 +1036,65 @@ async function ensureStructuredOutputExists( ); if (!output) return; + // Same dedup pattern as `ensureToolExists`, against the SO state section + // and live dashboard SO list. + const remoteList = await getExistingRemoteStructuredOutputs(ctx); + const match = findExistingResourceByName({ + localResourceId: outputId, + localPayload: output.data, + stateSection: ctx.state.structuredOutputs, + remoteList, + }); + if (match) { + if (match.ambiguous) { + const displayName = extractResourceName(output.data) ?? outputId; + console.warn( + ` ⚠️ Multiple dashboard structured outputs share the name "${displayName}" — adopting ${match.uuid} (lex-smallest). Other UUIDs: ${match.duplicateUuids.join(", ")}. Run \`npm run cleanup -- ${VAPI_ENV}\` to prune duplicates.`, + ); + } + console.log( + ` 🔁 Reusing existing structured output: ${outputId} → ${match.uuid} (matched via ${match.source})`, + ); + + // Re-key state to point at the adopted UUID under the local resourceId. + // No hash yet — applyStructuredOutput below will PATCH with the local + // payload and record the post-PATCH hash, exercising the standard + // drift-check flow. + upsertState(ctx.state.structuredOutputs, output.resourceId, { + uuid: match.uuid, + }); + + // Orphan-deletion guard — drop other state keys pointing at the SAME + // uuid so a subsequent full push doesn't see them as "tracked but no + // local file" and DELETE the dashboard resource we just adopted. Mark + // them touched so the scoped state-merge on save flushes the deletion. + for (const [staleKey, entry] of Object.entries(ctx.state.structuredOutputs)) { + if (staleKey !== output.resourceId && entry.uuid === match.uuid) { + delete ctx.state.structuredOutputs[staleKey]; + ctx.touched.structuredOutputs.add(staleKey); + } + } + + // PATCH via the standard apply path so drift detection fires and any + // local edits land on the dashboard. + try { + const uuid = await applyStructuredOutput(output, ctx.state); + ctx.autoApplied.add(`structuredOutputs:${outputId}`); + if (!uuid) return; + upsertState(ctx.state.structuredOutputs, output.resourceId, { + uuid, + lastPushedHash: hashPayload(output.data), + }); + ctx.applied.structuredOutputs++; + ctx.autoAppliedStructuredOutputs.push(output); + ctx.touched.structuredOutputs.add(output.resourceId); + } catch (error) { + console.error(formatApiError(outputId, error)); + throw error; + } + return; + } + console.log(` 📦 Auto-applying dependency → structured output: ${outputId}`); try { const uuid = await applyStructuredOutput(output, ctx.state); @@ -887,6 +1106,7 @@ async function ensureStructuredOutputExists( }); ctx.applied.structuredOutputs++; ctx.autoAppliedStructuredOutputs.push(output); + ctx.touched.structuredOutputs.add(output.resourceId); } catch (error) { console.error(formatApiError(outputId, error)); throw error; @@ -951,6 +1171,62 @@ async function ensureAssistantExists( const assistant = ctx.allAssistants.find((a) => a.resourceId === assistantId); if (!assistant) return; + // Same dedup pattern as `ensureToolExists` / `ensureStructuredOutputExists`, + // against the assistant state section and live dashboard list. Catches the + // case where bootstrap pull stored the same dashboard assistant under a + // `-` key and the squad references it by the original + // local key. + const remoteList = await getExistingRemoteAssistants(ctx); + const match = findExistingResourceByName({ + localResourceId: assistantId, + localPayload: assistant.data, + stateSection: ctx.state.assistants, + remoteList, + }); + if (match) { + if (match.ambiguous) { + const displayName = extractResourceName(assistant.data) ?? assistantId; + console.warn( + ` ⚠️ Multiple dashboard assistants share the name "${displayName}" — adopting ${match.uuid} (lex-smallest). Other UUIDs: ${match.duplicateUuids.join(", ")}. Run \`npm run cleanup -- ${VAPI_ENV}\` to prune duplicates.`, + ); + } + console.log( + ` 🔁 Reusing existing assistant: ${assistantId} → ${match.uuid} (matched via ${match.source})`, + ); + + upsertState(ctx.state.assistants, assistant.resourceId, { + uuid: match.uuid, + }); + + // Orphan-deletion guard — drop other state keys pointing at the SAME + // uuid so a subsequent full push doesn't see them as "tracked but no + // local file" and DELETE the dashboard resource we just adopted. + for (const [staleKey, entry] of Object.entries(ctx.state.assistants)) { + if (staleKey !== assistant.resourceId && entry.uuid === match.uuid) { + delete ctx.state.assistants[staleKey]; + ctx.touched.assistants.add(staleKey); + } + } + + // PATCH via the standard apply path so drift detection fires and any + // local edits land on the dashboard. + try { + const uuid = await applyAssistant(assistant, ctx.state); + ctx.autoApplied.add(`assistants:${assistantId}`); + if (!uuid) return; + upsertState(ctx.state.assistants, assistant.resourceId, { + uuid, + lastPushedHash: hashPayload(assistant.data), + }); + ctx.applied.assistants++; + ctx.touched.assistants.add(assistant.resourceId); + } catch (error) { + console.error(formatApiError(assistantId, error)); + throw error; + } + return; + } + console.log(` 📦 Auto-applying dependency → assistant: ${assistantId}`); try { const uuid = await applyAssistant(assistant, ctx.state); @@ -964,6 +1240,7 @@ async function ensureAssistantExists( }); ctx.applied.assistants++; ctx.autoApplied.add(`assistants:${assistantId}`); + ctx.touched.assistants.add(assistant.resourceId); } catch (error) { console.error(formatApiError(assistantId, error)); throw error; @@ -1001,8 +1278,8 @@ async function main(): Promise { // Load current state (needed for reference resolution even in partial apply) let state = loadState(); - // Stack J — track which resourceIds we actually mutate so the end-of-run - // save can merge into existing on-disk state instead of rewriting wholesale. + // Track which resourceIds we actually mutate so the end-of-run save can + // merge into existing on-disk state instead of rewriting wholesale. const touched: TouchedSets = emptyTouchedSets(); // Track what was applied for summary @@ -1178,6 +1455,7 @@ async function main(): Promise { autoApplied, autoAppliedTools, autoAppliedStructuredOutputs, + touched, }; // Determine which types to check for orphaned deletions @@ -1503,9 +1781,9 @@ async function main(): Promise { ); } else { try { - // Stack J — for scoped pushes, only persist entries we actually - // mutated. Re-load disk state and merge our touched entries on top - // so unrelated drift in untouched entries is left alone. A bare + // For scoped pushes, only persist entries we actually mutated. + // Re-load disk state and merge our touched entries on top so + // unrelated drift in untouched entries is left alone. A bare // (non-partial) push falls through to the wholesale save. const stateToWrite = partial ? mergeScoped(loadState(), state, touched) diff --git a/src/resolver.ts b/src/resolver.ts index 358dd57..f85b61e 100644 --- a/src/resolver.ts +++ b/src/resolver.ts @@ -11,9 +11,9 @@ function isUUID(value: string): boolean { return UUID_REGEX.test(value); } -// Check if a UUID is tracked in a state section (reverse lookup). After -// Stack F, sections store ResourceState entries — extract `.uuid` for the -// reverse-lookup membership check. +// Check if a UUID is tracked in a state section (reverse lookup). Sections +// store ResourceState entries — extract `.uuid` for the reverse-lookup +// membership check. function isKnownUUID(uuid: string, stateSection: Record): boolean { for (const entry of Object.values(stateSection)) { if (entry.uuid === uuid) return true; diff --git a/src/sim-cmd.ts b/src/sim-cmd.ts index 74e40e5..649d9cf 100644 --- a/src/sim-cmd.ts +++ b/src/sim-cmd.ts @@ -1,7 +1,7 @@ // CLI entry: `npm run sim -- --suite --target ` // -// Distinct from `npm run eval` (legacy /evals endpoint). See AGENTS.md and -// improvements.md #16 for the rationale. +// Distinct from `npm run eval` (legacy /evals endpoint). See AGENTS.md for +// the rationale. import { formatSummary, diff --git a/src/sim.ts b/src/sim.ts index acf05c8..c95f089 100644 --- a/src/sim.ts +++ b/src/sim.ts @@ -95,16 +95,16 @@ export function loadStateFile(env: string): StateFile { ); } const state = JSON.parse(readFileSync(stateFile, "utf-8")) as StateFile; - // Forward-compat: if a future state schema (Stack F) wraps strings as - // {uuid: string}, surface the .uuid field; otherwise treat values as the - // legacy bare string. The local function below handles both shapes. + // Forward-compat: if the state schema wraps strings as {uuid: string}, + // surface the .uuid field; otherwise treat values as the legacy bare + // string. The local function below handles both shapes. return state; } -// Resolve a local resource name → platform UUID. Stack F migrates state -// values to ResourceState, so this helper accepts both shapes (string OR -// {uuid: string}) and returns just the UUID. Until F lands, it short-circuits -// on the string case. +// Resolve a local resource name → platform UUID. The state schema may store +// values as either bare string UUIDs (legacy) or ResourceState objects +// ({uuid: string, ...}); this helper accepts both shapes and returns just +// the UUID. function stateValueToUuid(value: unknown): string | undefined { if (typeof value === "string") return value; if ( diff --git a/src/snapshot.ts b/src/snapshot.ts index fff2b9b..eb2e7d3 100644 --- a/src/snapshot.ts +++ b/src/snapshot.ts @@ -1,5 +1,5 @@ // ───────────────────────────────────────────────────────────────────────────── -// Snapshot-on-push — Stack H +// Snapshot-on-push // // Before each PATCH, write the *outgoing* (local) payload AND the *current // platform* payload to a per-push directory: @@ -10,7 +10,7 @@ // `platform` payload as a PATCH, restoring the dashboard to its state at // the moment of the snapshot. // -// Reuses Stack G's drift-fetch path: when drift detection ran for this +// Reuses the drift-detection fetch path: when drift detection ran for this // PATCH, the GET'd platform payload is passed in here so we don't pay a // second GET. Snapshots are local-operator state and are gitignored. // ───────────────────────────────────────────────────────────────────────────── diff --git a/src/state-merge.ts b/src/state-merge.ts index a88153d..5e4f441 100644 --- a/src/state-merge.ts +++ b/src/state-merge.ts @@ -1,4 +1,4 @@ -// Stack J — scoped state writes. +// Scoped state writes. // // On a scoped push (`npm run push -- assistants/foo.md`), the engine // previously rewrote the entire state file even when only one assistant diff --git a/src/state-serialize.ts b/src/state-serialize.ts index a262289..301cc46 100644 --- a/src/state-serialize.ts +++ b/src/state-serialize.ts @@ -29,8 +29,8 @@ export function sortedKeysReplacer(_key: string, value: unknown): unknown { // Canonicalize a value: sort object keys at every level, drop null/undefined // leaves recursively, leave array order intact. Produces a stable shape // regardless of insertion order or transient nullish leaves the API may -// emit. Used by Stack F (content hashes) and Stack G (drift detection) — -// kept here so the helpers stay co-located. +// emit. Used by content hashing and drift detection — kept here so the +// helpers stay co-located. export function canonicalize(value: unknown): unknown { if (value === null || value === undefined) return undefined; if (Array.isArray(value)) { @@ -51,7 +51,8 @@ export function canonicalize(value: unknown): unknown { } // Stable sha256 of a payload after canonicalization. Used for content drift -// detection (this stack populates the hashes; Stack G consumes them). +// detection — this helper populates the hashes; the drift detection layer +// consumes them. export function hashPayload(payload: unknown): string { const canonical = canonicalize(payload); return createHash("sha256") @@ -90,9 +91,9 @@ export function upsertState( }; } -// Pronunciation-dictionary drop check (improvements.md #7). Detects when a -// dictionary attachment disappears from the platform between pulls. Two -// shapes are supported because Vapi exposes a different field per provider: +// Pronunciation-dictionary drop check. Detects when a dictionary attachment +// disappears from the platform between pulls. Two shapes are supported +// because Vapi exposes a different field per provider: // // - 11labs (documented at // https://docs.vapi.ai/assistants/pronunciation-dictionaries): diff --git a/src/types.ts b/src/types.ts index c41ffc4..9c10746 100644 --- a/src/types.ts +++ b/src/types.ts @@ -2,14 +2,14 @@ // Types // ───────────────────────────────────────────────────────────────────────────── -// Per-resource state metadata. Stack F architectural pivot: state values -// changed from bare `string` (UUID) to a structured ResourceState carrying -// content hashes, timestamps, and (Stack I) optional platform version IDs. +// Per-resource state metadata. State values are a structured ResourceState +// carrying content hashes, timestamps, and an optional platform version ID +// for ETag-based optimistic concurrency. // // Backwards compatibility: legacy state files loaded with bare string values // are migrated at load time in `loadState()` — each string becomes // { uuid: } with no other fields. The first push or pull after -// migration populates the hash fields. Until then, drift detection (Stack G) +// migration populates the hash fields. Until then, drift detection // short-circuits cleanly because `lastPulledHash` is undefined. // // Why preserve backwards-compat instead of doing a flag-day migration: @@ -21,8 +21,8 @@ export interface ResourceState { uuid: string; // sha256 of the canonicalized platform payload at last pull. Set by - // `pull.ts` after `cleanResource()` + canonical sort. Used by Stack G - // drift detection. + // `pull.ts` after `cleanResource()` + canonical sort. Used by drift + // detection. lastPulledHash?: string; // ISO-8601 timestamp of the last pull. Useful for triage when investigating // "when did this drift?". @@ -30,14 +30,14 @@ export interface ResourceState { // sha256 of the canonicalized payload that was last sent on PATCH/POST. // Distinct from `lastPulledHash` because we may push without pulling. lastPushedHash?: string; - // Platform-provided ETag / version identifier (Stack I). Engine populates - // it from response headers when the platform exposes one. + // Platform-provided ETag / version identifier for optimistic concurrency. + // Engine populates it from response headers when the platform exposes one. platformVersionId?: string; } -// `StateFile` is the on-disk shape of `.vapi-state..json`. After -// Stack F it carries `Record` per section instead of -// bare strings. `loadState()` migrates legacy data automatically. +// `StateFile` is the on-disk shape of `.vapi-state..json`. Each section +// carries `Record` instead of bare strings. +// `loadState()` migrates legacy data automatically. export interface StateFile { credentials: Record; assistants: Record; diff --git a/tests/dep-dedup.test.ts b/tests/dep-dedup.test.ts new file mode 100644 index 0000000..15e8af2 --- /dev/null +++ b/tests/dep-dedup.test.ts @@ -0,0 +1,199 @@ +import test from "node:test"; +import assert from "node:assert/strict"; +import { + findExistingResourceByName, + slugify, + extractBaseSlug, + extractResourceName, +} from "../src/dep-dedup.ts"; + +// Dedup helper coverage. Verifies that bootstrap-renamed state entries and +// live dashboard duplicates are detected before the auto-apply path POSTs +// a fresh duplicate tool / SO / assistant. + +test("slugify lowercases and dashes", () => { + assert.equal(slugify("End Call Tool"), "end-call-tool"); + assert.equal(slugify("HandoffToSales"), "handofftosales"); +}); + +test("extractBaseSlug strips uuid8 suffix", () => { + assert.equal(extractBaseSlug("end-call-67aea057"), "end-call"); + assert.equal(extractBaseSlug("end-call"), "end-call"); // no suffix + assert.equal(extractBaseSlug("foo-bar-12345678"), "foo-bar"); +}); + +test("extractResourceName: tool uses function.name", () => { + assert.equal( + extractResourceName({ function: { name: "endCall" } }), + "endCall", + ); +}); + +test("extractResourceName: SO uses top-level name", () => { + assert.equal( + extractResourceName({ name: "post-call-summary" }), + "post-call-summary", + ); +}); + +test("extractResourceName: top-level name wins over function.name", () => { + assert.equal( + extractResourceName({ name: "outer", function: { name: "inner" } }), + "outer", + ); +}); + +test("findExistingResourceByName: assistant payload (top-level name only)", () => { + // Assistants use top-level `name` and have no nested `function`. The + // squad → assistant auto-apply path hits this shape via + // ensureAssistantExists. Bootstrap may store the same dashboard assistant + // under `-` (e.g., `support-bot-1234abcd`); a squad + // referencing the original local key (`support-bot`) must adopt rather + // than mint a duplicate. + const m = findExistingResourceByName({ + localResourceId: "support-bot", + localPayload: { name: "support-bot" }, + stateSection: { + "support-bot-1234abcd": { uuid: "uuid-asst-aaa" }, + }, + remoteList: [{ id: "uuid-asst-aaa", name: "support-bot" }], + }); + assert.equal(m?.uuid, "uuid-asst-aaa"); + assert.equal(m?.source, "both"); + assert.equal(m?.ambiguous, false); +}); + +test("findExistingResourceByName: state-only match", () => { + const m = findExistingResourceByName({ + localResourceId: "b2b-invoice-end-call", + localPayload: { function: { name: "end-call" } }, + stateSection: { + "end-call-67aea057": { uuid: "uuid-aaa" }, + }, + }); + assert.deepEqual(m, { + uuid: "uuid-aaa", + source: "state", + ambiguous: false, + duplicateUuids: [], + }); +}); + +test("findExistingResourceByName: dashboard-only match", () => { + const m = findExistingResourceByName({ + localResourceId: "end-call", + localPayload: { function: { name: "end-call" } }, + stateSection: {}, + remoteList: [{ id: "uuid-bbb", function: { name: "end-call" } }], + }); + assert.deepEqual(m, { + uuid: "uuid-bbb", + source: "dashboard", + ambiguous: false, + duplicateUuids: [], + }); +}); + +test("findExistingResourceByName: state and dashboard both match same uuid", () => { + const m = findExistingResourceByName({ + localResourceId: "end-call", + localPayload: { function: { name: "end-call" } }, + stateSection: { "end-call-67aea057": { uuid: "uuid-aaa" } }, + remoteList: [{ id: "uuid-aaa", function: { name: "end-call" } }], + }); + assert.equal(m?.source, "both"); + assert.equal(m?.ambiguous, false); +}); + +test("findExistingResourceByName: ambiguous → lex-smallest UUID + duplicates surfaced", () => { + const m = findExistingResourceByName({ + localResourceId: "end-call", + localPayload: { function: { name: "end-call" } }, + stateSection: { + "end-call-67aea057": { uuid: "uuid-zzz" }, + "end-call-16ff08ed": { uuid: "uuid-aaa" }, + }, + }); + assert.equal(m?.uuid, "uuid-aaa"); + assert.equal(m?.ambiguous, true); + assert.deepEqual(m?.duplicateUuids, ["uuid-zzz"]); +}); + +test("findExistingResourceByName: no name on payload → undefined", () => { + const m = findExistingResourceByName({ + localResourceId: "x", + localPayload: {}, + stateSection: { "end-call-67aea057": { uuid: "uuid-aaa" } }, + }); + assert.equal(m, undefined); +}); + +test("findExistingResourceByName: localResourceId exact match excluded (caller's job)", () => { + const m = findExistingResourceByName({ + localResourceId: "end-call-67aea057", + localPayload: { function: { name: "end-call" } }, + stateSection: { "end-call-67aea057": { uuid: "uuid-aaa" } }, + }); + // Helper excludes the exact-key, leaving no match. Caller should have + // short-circuited on the exact-key check before calling. + assert.equal(m, undefined); +}); + +test("findExistingResourceByName: no match → undefined", () => { + const m = findExistingResourceByName({ + localResourceId: "transfer-to-sales", + localPayload: { function: { name: "transfer-to-sales" } }, + stateSection: { "end-call-67aea057": { uuid: "uuid-aaa" } }, + remoteList: [{ id: "uuid-bbb", function: { name: "voicemail" } }], + }); + assert.equal(m, undefined); +}); + +test("findExistingResourceByName: ambiguous across state vs dashboard → lex-smallest, both surfaced", () => { + // State has one UUID under a bootstrap-renamed key, dashboard has a + // distinct UUID under the same canonical name (real on-dashboard + // duplicate from a prior bug run). Both must surface in duplicateUuids + // and the winner must be lex-smallest. Source must be "both" only when + // the SAME uuid appears in both — here the winner appears in only one, + // so source is whichever side it came from. + const m = findExistingResourceByName({ + localResourceId: "b2b-invoice-end-call", + localPayload: { function: { name: "end-call" } }, + stateSection: { "end-call-67aea057": { uuid: "uuid-zzz" } }, + remoteList: [{ id: "uuid-aaa", function: { name: "end-call" } }], + }); + assert.equal(m?.uuid, "uuid-aaa"); + assert.equal(m?.ambiguous, true); + assert.equal(m?.source, "dashboard"); + assert.deepEqual(m?.duplicateUuids, ["uuid-zzz"]); +}); + +test("findExistingResourceByName: empty stateSection AND empty remoteList → undefined", () => { + // Defensive: a fresh push with no prior state and no dashboard population + // must short-circuit to undefined so the auto-create path runs. + const m = findExistingResourceByName({ + localResourceId: "anything", + localPayload: { function: { name: "anything" } }, + stateSection: {}, + remoteList: [], + }); + assert.equal(m, undefined); +}); + +test("findExistingResourceByName: remote payload uses top-level `name` (not function.name)", () => { + // Pull-side / dashboard list payloads for non-tool resources (and some + // tool list endpoints) expose `name` directly instead of nested in + // `function.name`. The dedup helper must recognize both shapes. + const m = findExistingResourceByName({ + localResourceId: "post-call-summary", + localPayload: { name: "post-call-summary" }, + stateSection: {}, + remoteList: [{ id: "uuid-ccc", name: "post-call-summary" }], + }); + assert.deepEqual(m, { + uuid: "uuid-ccc", + source: "dashboard", + ambiguous: false, + duplicateUuids: [], + }); +});