feat: add facial voice presentation hooks by JOY · Pull Request #272 · DOS/Second-Spawn

JOY (JOY) · 2026-05-26T05:26:01Z

What does this PR do?

Adds the first custom facial-animation and scoped voice presentation path for prototype NPC speech. Unity now requests voice session material through the existing Nakama gateway, plays a server-provided clip when available, and falls back to a local prototype tone that drives mouth blendshapes by audio amplitude.

Linked issue / ADR

Refs #262. Refs #25 for the later Ida Faber asset import validation pass.

Touched areas

Unity client (Unity/)
Dedicated server build flags / CI
Nakama runtime (backend/nakama/)
Supabase schema / RLS policies
DOS Chain integration / NFT contracts
AI agent runtime
Design docs (docs/design/) or ADRs (docs/adr/)
CI / project tooling

Test plan

PASS: git diff --cached --check
PASS: npx.cmd --yes markdownlint-cli2 docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md
BLOCKED: Unity MCP inspection, because no Unity editor instance was connected (mcpforunity://instances returned 0 instances).
BLOCKED: Unity 6000.5.0b9 batch import at Unity/, due to the existing Package Manager failure: The "path" argument must be of type string. Received undefined. No packages loaded. No C# compiler errors were emitted before package resolution stopped.
BLOCKED: npm.cmd run build in backend/nakama, due to existing TypeScript config failure TS5103: Invalid value for '--ignoreDeprecations' with ignoreDeprecations: "6.0".
BLOCKED: npm.cmd test in backend/nakama, because build/index.js is absent after the blocked build.

No Nakama runtime source was changed in this PR.

Server-authority check (mandatory if touching gameplay)

No new gameplay logic runs on the Unity client
No new API key embedded in the Unity client
LLM outputs are validated as intent server-side, never auto-applied
If this PR adds a new state mutation path, it goes through the Nakama runtime validator before any gameplay system consumes it

This PR only adds local speech presentation. Voice material is requested through Nakama/api.dos.ai session data and remains presentation only.

Reviewer pass

AI agent reviewer pass attached

Local code-review fallback verdict: APPROVED WITH VERIFICATION CAVEATS.

Review summary:

Unity/Nakama boundary is preserved: the new presenter uses SecondSpawnGatewayClient.GetVoiceSession for scoped voice material and does not embed provider keys.
No gameplay authority, inventory, TIME, SECOND, quest, combat, relationship, or memory mutation was added to Unity.
Facial animation is local presentation only and supports safe fallback for bodies without compatible blendshapes.
Main residual risk is runtime validation: Unity Play Mode could not run because package resolution fails before package load in this workspace.

JOY (JOY) · 2026-05-26T10:20:47Z

Follow-up implementation pushed.

Added:

Actor-specific presentation routing: prototype chat now resolves the active PrototypeAgentBrain by actor id before choosing speech bubble / voice presenter.
Voice/facial runtime diagnostics: active NPCs expose voice presentation mode, reason, facial target summary, resolved renderer, and mouth/blink target readiness.
Stable voice line ids: replaced process-random string.GetHashCode() with deterministic FNV-1a-style hashing.
Presentation lifecycle fix: voice presentation coroutine now clears its state when the presentation loop finishes.
Editor-only blendshape reporting: SecondSpawnFacialBlendshapeReportUtility reports selected character or generated visual prefab blendshape names so agents can verify imported Ida/ARKit-style targets before approving a lip-sync profile.

Validation:

git diff --check passed.
npx --yes markdownlint-cli2 docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md passed.
Unity validation queue passed with D3D11: feat/unity-validation-workflow, fix/unity-console-warnings, feat/facial-voice-mvp.
Validation log: Forcing GfxDevice: Direct3D 11, return code 0.
Validation log had no error CS*, warning CS*, compile failure, package failure, or Unity crash matches.

Local reviewer fallback: APPROVED WITH RUNTIME SMOKE CAVEAT.

Server authority boundary remains intact: Unity only requests scoped session material through Nakama and never stores provider keys.
Changes are presentation/debug only and do not mutate gameplay state, quests, TIME, SECOND, inventory, combat, memory, or relationships.
Remaining caveat: Play Mode interaction/capture still depends on an agent-owned Unity Editor MCP smoke once PR chore: add Unity validation worktree workflow #273 validation workflow is merged or used by the validation owner.

JOY (JOY) · 2026-05-26T11:24:55Z

Update: added LiveKit-ready realtime NPC voice input hook.

Implemented:

PrototypeNpcRealtimeVoiceClient for focused-dialogue push-to-talk microphone capture and text/audio realtime session submission.
Realtime voice DTOs and Nakama gateway RPC hooks for secondspawn_realtime_voice_session_request and secondspawn_realtime_voice_input.
Nearby NPC chat bridge so returned transcripts route through the existing player-to-NPC dialogue path, preserving Nakama/Fusion authority boundaries.
Chat panel Mic toggle with honest fallback when the realtime backend RPC is not deployed.
Design doc notes clarifying LiveKit as future media transport only, while Photon remains game networking and Nakama/Fusion remain authoritative.

Validation:

git diff --check: pass.

px --yes markdownlint-cli2 docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md: pass.

Unity validation queue acial-voice-realtime-input with refs eat/unity-validation-workflow, ix/unity-console-warnings, eat/facial-voice-mvp: compile pass using D3D11.
Unity validation queue acial-voice-realtime-input-editmode with same refs: compile pass and EditMode test runner pass using D3D11.

Local code-review fallback verdict: approved. The Unity client still does not call model/voice providers directly or store provider keys. LLM/voice output remains presentation/dialogue only; gameplay state still requires Nakama/Fusion validation. Remaining gap: live Play Mode smoke for actual mic UX should run after this branch is merged into the integration Unity workspace, because root dev is currently dirty and this feature branch is validated in the dedicated validation worktree.

JOY (JOY) · 2026-05-26T11:49:30Z

Update: pushed real voice-turn wiring, not just the client hook.

Implemented now:

Windows Editor speak/listen fallback: Unity can use Windows Dictation for player microphone transcripts and Windows SAPI to synthesize NPC WAV playback locally when cloud voice is not configured.
PrototypeNpcVoicePresenter now prefers real Windows SAPI speech before falling back to the old prototype tone.
Nakama registers secondspawn_realtime_voice_session_request and secondspawn_realtime_voice_input.
Realtime voice input RPC can call �pi.dos.ai via DOS_AI_REALTIME_VOICE_ENABLED=true, DOS_AI_API_KEY, and DOS_AI_REALTIME_VOICE_URL; response stays dialogue/presentation only.
Tests cover disabled local fallback, text fallback, and configured realtime voice input with forbidden state mutation boundary.

Validation:

�ackend/nakama:
pm.cmd run build pass.
�ackend/nakama:
pm.cmd test pass.
git diff --check pass.
markdownlint for voice design doc + Nakama README pass.
Unity validation queue acial-voice-live-turns: D3D11 compile pass.
Unity validation queue acial-voice-live-turns-editmode: D3D11 compile + EditMode pass.
Windows SAPI smoke generated a real WAV file: second-spawn-sapi-smoke.wav, 122384 bytes.
Unity MCP root console check found no script compile errors from this branch; only existing Funplay MCP port-bind errors. Root Editor Play Mode was not run because root dev is dirty and does not contain this branch yet.

Local code-review fallback verdict: approved. Unity still does not store provider keys or call Gemini/OpenAI directly. Cloud voice remains behind Nakama/api.dos.ai; local Windows speech is a development fallback only.

JOY (JOY) · 2026-05-26T16:25:17Z

Validation update for commit 2586604:

Added Unity playback for realtime NPC voice_audio_base64 responses.
Direct realtime provider turns now display the player transcript locally and present the returned NPC turn without triggering a duplicate text-chat LLM request.
Supported direct raw PCM16 (pcm_s16le_<sample_rate>) and WAV PCM16 payload decoding into AudioClip, then reused the existing audio-amplitude facial driver.
Nakama test now asserts passthrough for voice_audio_base64 and voice_audio_format.

Local validation:

git diff --check: pass
npx --yes markdownlint-cli2 docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md: pass
backend/nakama npm.cmd run build: pass
backend/nakama npm.cmd test: pass
Unity validation worktree queue facial-voice-direct-audio-r2: compile pass, EditMode pass
Root dev integration: merged origin/dev and origin/feat/facial-voice-mvp; backend/nakama build/test pass
Root Unity MCP Play Mode smoke: entered and exited Play Mode on Assets/_SecondSpawn/Scenes/ZoneTest_Hub.unity; no new compile errors. Existing editor/MCP noise remains: GameObjectInspector missing target, Funplay port 8768 in use, and VS/Unity UDP 56114 warning.

Local reviewer fallback verdict: approved with known Unity Editor/MCP noise. No provider API keys are exposed in Unity, and the new audio path remains presentation-only.

feat: add facial voice presentation hooks

528a93a

JOY (JOY) mentioned this pull request May 26, 2026

chore: add Unity validation worktree workflow #273

Draft

13 tasks

JOY (JOY) added 3 commits May 26, 2026 17:13

feat: improve NPC facial voice diagnostics

a9ee9d2

chore: add facial blendshape utility meta

7feff7d

fix: reset NPC voice presentation state

051f8ee

feat: add realtime NPC voice input hook

ef4b186

feat: wire realtime NPC voice turns

5fb3026

feat: play realtime NPC voice audio

2586604

JOY (JOY) marked this pull request as ready for review May 26, 2026 16:25

JOY (JOY) merged commit a09e202 into dev May 26, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add facial voice presentation hooks#272

feat: add facial voice presentation hooks#272
JOY (JOY) merged 7 commits into
devfrom
feat/facial-voice-mvp

JOY (JOY) commented May 26, 2026

Uh oh!

JOY (JOY) commented May 26, 2026

Uh oh!

JOY (JOY) commented May 26, 2026

Uh oh!

JOY (JOY) commented May 26, 2026

Uh oh!

JOY (JOY) commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JOY (JOY) commented May 26, 2026

What does this PR do?

Linked issue / ADR

Touched areas

Test plan

Server-authority check (mandatory if touching gameplay)

Reviewer pass

Uh oh!

JOY (JOY) commented May 26, 2026

Uh oh!

JOY (JOY) commented May 26, 2026

Uh oh!

JOY (JOY) commented May 26, 2026

Uh oh!

JOY (JOY) commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant