Skip to content

feat: add facial voice presentation hooks#272

Merged
JOY (JOY) merged 7 commits into
devfrom
feat/facial-voice-mvp
May 26, 2026
Merged

feat: add facial voice presentation hooks#272
JOY (JOY) merged 7 commits into
devfrom
feat/facial-voice-mvp

Conversation

@JOY
Copy link
Copy Markdown
Contributor

What does this PR do?

Adds the first custom facial-animation and scoped voice presentation path for prototype NPC speech. Unity now requests voice session material through the existing Nakama gateway, plays a server-provided clip when available, and falls back to a local prototype tone that drives mouth blendshapes by audio amplitude.

Linked issue / ADR

Refs #262. Refs #25 for the later Ida Faber asset import validation pass.

Touched areas

  • Unity client (Unity/)
  • Dedicated server build flags / CI
  • Nakama runtime (backend/nakama/)
  • Supabase schema / RLS policies
  • DOS Chain integration / NFT contracts
  • AI agent runtime
  • Design docs (docs/design/) or ADRs (docs/adr/)
  • CI / project tooling

Test plan

  • PASS: git diff --cached --check
  • PASS: npx.cmd --yes markdownlint-cli2 docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md
  • BLOCKED: Unity MCP inspection, because no Unity editor instance was connected (mcpforunity://instances returned 0 instances).
  • BLOCKED: Unity 6000.5.0b9 batch import at Unity/, due to the existing Package Manager failure: The "path" argument must be of type string. Received undefined. No packages loaded. No C# compiler errors were emitted before package resolution stopped.
  • BLOCKED: npm.cmd run build in backend/nakama, due to existing TypeScript config failure TS5103: Invalid value for '--ignoreDeprecations' with ignoreDeprecations: "6.0".
  • BLOCKED: npm.cmd test in backend/nakama, because build/index.js is absent after the blocked build.

No Nakama runtime source was changed in this PR.

Server-authority check (mandatory if touching gameplay)

  • No new gameplay logic runs on the Unity client
  • No new API key embedded in the Unity client
  • LLM outputs are validated as intent server-side, never auto-applied
  • If this PR adds a new state mutation path, it goes through the Nakama runtime validator before any gameplay system consumes it

This PR only adds local speech presentation. Voice material is requested through Nakama/api.dos.ai session data and remains presentation only.

Reviewer pass

  • AI agent reviewer pass attached

Local code-review fallback verdict: APPROVED WITH VERIFICATION CAVEATS.

Review summary:

  • Unity/Nakama boundary is preserved: the new presenter uses SecondSpawnGatewayClient.GetVoiceSession for scoped voice material and does not embed provider keys.
  • No gameplay authority, inventory, TIME, SECOND, quest, combat, relationship, or memory mutation was added to Unity.
  • Facial animation is local presentation only and supports safe fallback for bodies without compatible blendshapes.
  • Main residual risk is runtime validation: Unity Play Mode could not run because package resolution fails before package load in this workspace.

@JOY
Copy link
Copy Markdown
Contributor Author

Follow-up implementation pushed.

Added:

  • Actor-specific presentation routing: prototype chat now resolves the active PrototypeAgentBrain by actor id before choosing speech bubble / voice presenter.
  • Voice/facial runtime diagnostics: active NPCs expose voice presentation mode, reason, facial target summary, resolved renderer, and mouth/blink target readiness.
  • Stable voice line ids: replaced process-random string.GetHashCode() with deterministic FNV-1a-style hashing.
  • Presentation lifecycle fix: voice presentation coroutine now clears its state when the presentation loop finishes.
  • Editor-only blendshape reporting: SecondSpawnFacialBlendshapeReportUtility reports selected character or generated visual prefab blendshape names so agents can verify imported Ida/ARKit-style targets before approving a lip-sync profile.

Validation:

  • git diff --check passed.
  • npx --yes markdownlint-cli2 docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md passed.
  • Unity validation queue passed with D3D11: feat/unity-validation-workflow, fix/unity-console-warnings, feat/facial-voice-mvp.
  • Validation log: Forcing GfxDevice: Direct3D 11, return code 0.
  • Validation log had no error CS*, warning CS*, compile failure, package failure, or Unity crash matches.

Local reviewer fallback: APPROVED WITH RUNTIME SMOKE CAVEAT.

  • Server authority boundary remains intact: Unity only requests scoped session material through Nakama and never stores provider keys.
  • Changes are presentation/debug only and do not mutate gameplay state, quests, TIME, SECOND, inventory, combat, memory, or relationships.
  • Remaining caveat: Play Mode interaction/capture still depends on an agent-owned Unity Editor MCP smoke once PR chore: add Unity validation worktree workflow #273 validation workflow is merged or used by the validation owner.

@JOY
Copy link
Copy Markdown
Contributor Author

Update: added LiveKit-ready realtime NPC voice input hook.

Implemented:

  • PrototypeNpcRealtimeVoiceClient for focused-dialogue push-to-talk microphone capture and text/audio realtime session submission.
  • Realtime voice DTOs and Nakama gateway RPC hooks for secondspawn_realtime_voice_session_request and secondspawn_realtime_voice_input.
  • Nearby NPC chat bridge so returned transcripts route through the existing player-to-NPC dialogue path, preserving Nakama/Fusion authority boundaries.
  • Chat panel Mic toggle with honest fallback when the realtime backend RPC is not deployed.
  • Design doc notes clarifying LiveKit as future media transport only, while Photon remains game networking and Nakama/Fusion remain authoritative.

Validation:

  • git diff --check: pass.

px --yes markdownlint-cli2 docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md: pass.

  • Unity validation queue acial-voice-realtime-input with refs eat/unity-validation-workflow, ix/unity-console-warnings, eat/facial-voice-mvp: compile pass using D3D11.
  • Unity validation queue acial-voice-realtime-input-editmode with same refs: compile pass and EditMode test runner pass using D3D11.

Local code-review fallback verdict: approved. The Unity client still does not call model/voice providers directly or store provider keys. LLM/voice output remains presentation/dialogue only; gameplay state still requires Nakama/Fusion validation. Remaining gap: live Play Mode smoke for actual mic UX should run after this branch is merged into the integration Unity workspace, because root dev is currently dirty and this feature branch is validated in the dedicated validation worktree.

@JOY
Copy link
Copy Markdown
Contributor Author

Update: pushed real voice-turn wiring, not just the client hook.

Implemented now:

  • Windows Editor speak/listen fallback: Unity can use Windows Dictation for player microphone transcripts and Windows SAPI to synthesize NPC WAV playback locally when cloud voice is not configured.
  • PrototypeNpcVoicePresenter now prefers real Windows SAPI speech before falling back to the old prototype tone.
  • Nakama registers secondspawn_realtime_voice_session_request and secondspawn_realtime_voice_input.
  • Realtime voice input RPC can call �pi.dos.ai via DOS_AI_REALTIME_VOICE_ENABLED=true, DOS_AI_API_KEY, and DOS_AI_REALTIME_VOICE_URL; response stays dialogue/presentation only.
  • Tests cover disabled local fallback, text fallback, and configured realtime voice input with forbidden state mutation boundary.

Validation:

  • �ackend/nakama:
    pm.cmd run build pass.
  • �ackend/nakama:
    pm.cmd test pass.
  • git diff --check pass.
  • markdownlint for voice design doc + Nakama README pass.
  • Unity validation queue acial-voice-live-turns: D3D11 compile pass.
  • Unity validation queue acial-voice-live-turns-editmode: D3D11 compile + EditMode pass.
  • Windows SAPI smoke generated a real WAV file: second-spawn-sapi-smoke.wav, 122384 bytes.
  • Unity MCP root console check found no script compile errors from this branch; only existing Funplay MCP port-bind errors. Root Editor Play Mode was not run because root dev is dirty and does not contain this branch yet.

Local code-review fallback verdict: approved. Unity still does not store provider keys or call Gemini/OpenAI directly. Cloud voice remains behind Nakama/api.dos.ai; local Windows speech is a development fallback only.

@JOY
Copy link
Copy Markdown
Contributor Author

Validation update for commit 2586604:

  • Added Unity playback for realtime NPC voice_audio_base64 responses.
  • Direct realtime provider turns now display the player transcript locally and present the returned NPC turn without triggering a duplicate text-chat LLM request.
  • Supported direct raw PCM16 (pcm_s16le_<sample_rate>) and WAV PCM16 payload decoding into AudioClip, then reused the existing audio-amplitude facial driver.
  • Nakama test now asserts passthrough for voice_audio_base64 and voice_audio_format.

Local validation:

  • git diff --check: pass
  • npx --yes markdownlint-cli2 docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md: pass
  • backend/nakama npm.cmd run build: pass
  • backend/nakama npm.cmd test: pass
  • Unity validation worktree queue facial-voice-direct-audio-r2: compile pass, EditMode pass
  • Root dev integration: merged origin/dev and origin/feat/facial-voice-mvp; backend/nakama build/test pass
  • Root Unity MCP Play Mode smoke: entered and exited Play Mode on Assets/_SecondSpawn/Scenes/ZoneTest_Hub.unity; no new compile errors. Existing editor/MCP noise remains: GameObjectInspector missing target, Funplay port 8768 in use, and VS/Unity UDP 56114 warning.

Local reviewer fallback verdict: approved with known Unity Editor/MCP noise. No provider API keys are exposed in Unity, and the new audio path remains presentation-only.

@JOY JOY (JOY) marked this pull request as ready for review May 26, 2026 16:25
@JOY JOY (JOY) merged commit a09e202 into dev May 26, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant