Refine timestamps in spans and recording alignment #982

toubatbrian · 2026-01-16T21:36:23Z

Summary

This PR ports the Python PR #4131 (AGT-2316) to TypeScript, refining timestamp accuracy for telemetry spans and improving recording alignment.

Changes

Telemetry Timestamp Accuracy

User speech timing: Calculate accurate speech start time by subtracting speechDuration from detection time, rather than recording when VAD triggered
Agent speech timing: Track when audio playback actually starts (first frame captured) instead of when generation begins
Span start times: Added startTime parameter support to tracer.startSpan() to allow backdating spans

Recording Alignment

recorder_io.ts: Added _lastSpeechEndTime and _lastSpeechStartTime tracking for proper audio alignment
Silence padding: takeBuf() now supports padSince parameter to prepend silence frames when needed
Recording start time: Now returns the minimum of input/output start times for accurate alignment

Event Propagation

Added PlaybackStartedEvent interface and EVENT_PLAYBACK_STARTED constant to io.ts
ParticipantAudioOutput now emits playbackStarted event when first audio frame is captured
generation.ts listens for playback events to resolve firstFrameFut with accurate timestamp

OTel Context Propagation

Added _agentTurnContext to SpeechHandle to maintain proper span hierarchy
Agent state updates now pass OTel context for correct parent-child relationships

Bug Fix: Duplicate Tool Calls

Fixed duplicate FunctionCall entries in session history by filtering toolsMessages to only add FunctionCallOutput items (since FunctionCall items are already added by onToolExecutionStarted)

Utilities

Added rejected property to Future class to check if a future was rejected

Files Changed

File	Changes
`telemetry/traces.ts`	Added `startTime` to `StartSpanOptions`, pass directly to OTel SDK
`voice/io.ts`	Added `PlaybackStartedEvent`, `EVENT_PLAYBACK_STARTED`, `onPlaybackStarted()`
`voice/room_io/_output.ts`	Emit `playbackStarted` on first frame capture
`voice/generation.ts`	Listen for `playbackStarted`, resolve `firstFrameFut` with timestamp
`voice/audio_recognition.ts`	Calculate accurate speech start time with `speechDuration`
`voice/agent_session.ts`	Pass `startTime` and `otelContext` to state update methods
`voice/agent_activity.ts`	Propagate timestamps, set `_agentTurnContext`, fix duplicate tool calls
`voice/speech_handle.ts`	Added `_agentTurnContext` property
`voice/recorder_io/recorder_io.ts`	Added speech timing tracking, silence padding, aligned recording start
`utils.ts`	Added `rejected` getter to `Future` class

Testing

Verified telemetry spans now have accurate start times
Confirmed no duplicate function calls in Agent Insights transcript
All existing tests pass

Summary by CodeRabbit

Enhancements
- More accurate voice interaction timing and synchronization (better speech start/end alignment).
- Precise detection of audio playback start for improved first-frame timing and playback indicators.
- Improved recording/playback alignment with smarter silence padding to reduce glitches.
- Better context and timing propagation across voice workflows for more reliable voice responses.
Other
- Exposed rejection status for internal async operations (improves error visibility).

_{✏️ Tip: You can customize this high-level summary in your review settings.}

changeset-bot · 2026-01-16T21:36:27Z

🦋 Changeset detected

Latest commit: 8f38e2c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages

Name	Type
@livekit/agents	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugins-test	Patch
@livekit/agents-plugin-xai	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-01-16T21:36:35Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds explicit span startTime support and OpenTelemetry context propagation across voice workflows; implements event-driven first-frame playback timestamps, silence-padding and timing alignment in recorder IO, tracks Future rejection state, and carries OTEL context through speech handling and span creation.

Changes

Cohort / File(s)	Summary
Telemetry `agents/src/telemetry/traces.ts`	Added optional `startTime?: number` to `StartSpanOptions` to allow passing explicit span start timestamps into tracer.startSpan.
Futures / Utilities `agents/src/utils.ts`	Added private `#rejected` flag and public `rejected()` getter to `Future<T>`; `reject()` now sets the flag.
OTEL Context & Speech Flow `agents/src/voice/agent_activity.ts`, `agents/src/voice/agent_session.ts`, `agents/src/voice/speech_handle.ts`, `agents/src/voice/audio_recognition.ts`	Propagates OpenTelemetry context through speech tasks (`_agentTurnContext`), computes and passes speech start timestamps (from VAD durations) into `_updateUserState` / `_updateAgentState`, and uses provided `startTime` when creating user_turn and agent_speaking spans. Updated `onStartOfSpeech` signature and related first-frame callback flows.
Audio Playback First-Frame Timing `agents/src/voice/io.ts`, `agents/src/voice/generation.ts`	Introduced `AudioOutput.EVENT_PLAYBACK_STARTED` and `PlaybackStartedEvent`; added `onPlaybackStarted(createdAt)` handler and event forwarding. Changed `_AudioOut.firstFrameFut` to `Future<number>` and resolve it with an event-driven timestamp captured on playback start.
Room & Avatar Output First-Frame Emission `agents/src/voice/room_io/_output.ts`, `agents/src/voice/avatar/datastream_io.ts`	Track first-frame emission (`firstFrameEmitted`) and call `onPlaybackStarted(Date.now())` on the first emitted frame; reset flag on playout/flush to re-arm for next session.
Recorder IO: Buffering & Silence Padding `agents/src/voice/recorder_io/recorder_io.ts`	Passes last-speech-end timestamps into input buffering (`takeBuf(padSince?)`), pads input with silence when needed, computes `recordingStartedAt` from min(input/output), tracks `_lastSpeechStartTime`/_`_lastSpeechEndTime`, updates playback timing on finish, and logs padding/playback events.
Minor / Supporting `agents/src/voice/generation.ts` (imports), `agents/src/voice/*` (first-frame future usage)	Adjusted imports to mix value/type imports where needed and updated callsites to use `Future<number>` semantics for first-frame timestamps.

Sequence Diagram

sequenceDiagram
    participant VAD as VoiceDetector
    participant AA as AgentActivity
    participant SH as SpeechHandle
    participant AS as AgentSession
    participant Gen as Generation
    participant AO as AudioOutput
    participant TR as Tracer

    VAD->>VAD: Detect start_of_speech (has speechDuration)
    VAD->>TR: startSpan("user_turn", { startTime: now - speechDuration })
    VAD->>AA: onStartOfSpeech(VADEvent)
    AA->>SH: set _agentTurnContext = otelContext.active()
    AA->>AS: _updateUserState('speaking', speechStartTime)
    AS->>TR: startSpan("user_speaking",{ startTime: speechStartTime, context: otelContext })

    AA->>Gen: begin generation (carry otelContext)
    Gen->>Gen: create firstFrameFut: Future<number>
    Gen->>AO: forward audio frames
    AO->>AO: on first emitted frame -> emit EVENT_PLAYBACK_STARTED(createdAt)
    AO->>Gen: playbackStarted(createdAt)
    Gen->>Gen: resolve firstFrameFut with createdAt

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰
I thump a beat when spans align,
First frames chime in timely rhyme,
Context carried, timestamps bright,
Silence padded, playback right—
A hopping patch of timing, oh so prime!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main objective of the PR—refining timestamp accuracy in spans and fixing recording alignment—and is directly supported by the extensive changes across telemetry and voice modules.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

.changeset/lazy-spies-worry.md (1)

1-5: Verify version bump type and consider a more detailed description.

The PR introduces new public APIs (startTime parameter in spans, PlaybackStartedEvent, rejected getter) alongside bug fixes. Per semantic versioning, backward-compatible additions to the public API typically warrant a minor version bump rather than a patch. Consider whether this should be minor instead.

Additionally, the description "refine timestamps in spans and recording alignment" is quite terse. Consider expanding it to mention key features for changelog readers, such as:

Span startTime support for backdated spans

Playback start event tracking

Recording silence padding (padSince)

OpenTelemetry context propagation

Duplicate function call fix
agents/src/voice/avatar/datastream_io.ts (1)
180-185: Consider resetting firstFrameEmitted inside the finally callback.

Resetting firstFrameEmitted synchronously after initiating the async close() creates a narrow race window: if captureFrame() is called before the finally() callback runs, onPlaybackStarted could fire while still writing to the closing stream.

Moving the reset into finally alongside this.streamWriter = undefined would ensure both states transition atomically:
Suggested change
     this.streamWriter.close().finally(() => {
       this.streamWriter = undefined;
+      this.firstFrameEmitted = false;
     });
-
-    this.firstFrameEmitted = false;

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2eb8d02 and 8f38e2c.

📒 Files selected for processing (2)

.changeset/lazy-spies-worry.md
agents/src/voice/avatar/datastream_io.ts

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

agents/src/voice/avatar/datastream_io.ts

**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

agents/src/voice/avatar/datastream_io.ts

**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

agents/src/voice/avatar/datastream_io.ts

🔇 Additional comments (1)

agents/src/voice/avatar/datastream_io.ts (1)

150-154: LGTM! First-frame tracking correctly emits playback started event.

The logic properly triggers onPlaybackStarted exactly once per playback session by guarding with the firstFrameEmitted flag.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 markdownlint-cli2 (0.18.1)

.changeset/lazy-spies-worry.md

markdownlint-cli2 v0.18.1 (markdownlint v0.38.0)
Finding: .changeset/lazy-spies-worry.md
Linting: 1 file(s)
Summary: 0 error(s)
Error: EACCES: permission denied, open '/markdownlint-cli2-results.json'
at async open (node:internal/fs/promises:640:25)
at async Object.writeFile (node:internal/fs/promises:1214:14)
at async Promise.all (index 0)
at async outputSummary (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:877:5)
at async main (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:1053:25)
at async file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2-bin.mjs:12:22 {
errno: -13,
code: 'EACCES',
syscall: 'open',
path: '/markdownlint-cli2-results.json'
}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2eb8d02b56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

agents/src/voice/generation.ts

toubatbrian · 2026-01-16T21:44:34Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f38e2c44b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-16T21:58:36Z

agents/src/telemetry/traces.ts

  /** Whether to end the span when the function exits (default: true) */
  endOnExit?: boolean;
+  /** Optional start time for the span in milliseconds (Date.now() format) */
+  startTime?: number;


Honor startTime in startActiveSpan options

StartSpanOptions now advertises a startTime, but DynamicTracer.startActiveSpan/startActiveSpanSync still build SpanOptions without forwarding that field. As a result, callers who pass startTime to startActiveSpan (e.g., to backdate spans the same way startSpan now does) will silently get spans starting at “now,” which defeats the new timestamp accuracy and causes inconsistent telemetry. Consider passing startTime through when constructing SpanOptions for active spans.

Useful? React with 👍 / 👎.

lukasIO · 2026-01-19T09:11:29Z

agents/src/voice/recorder_io/recorder_io.ts

+      !this._padded &&
+      frames.length > 0
+    ) {
+      const padding = (this._startedWallTime - padSince) / 1000; // Convert ms to seconds


we're starting to mix seconds and ms without explicit naming which makes it harder to reason about which one we're dealing with. suggestion: treat ms as the default and use *InS for everything that's converted to seconds

lukasIO · 2026-01-19T09:11:54Z

agents/src/voice/recorder_io/recorder_io.ts

  onPlaybackFinished(options: PlaybackFinishedEvent): void {
-    const finishTime = Date.now();
+    const finishTime = this.currentPauseStart ?? Date.now();
+    const trailingSilenceDuration = Math.max(0, Date.now() - finishTime) / 1000; // Convert to seconds


same as above

lukasIO · 2026-01-19T09:12:35Z

agents/src/voice/recorder_io/recorder_io.ts

-
-    const pauseEvents: Array<[number, number]> = [];
+    const pauseEvents: Array<[number, number]> = []; // (position, duration) in seconds
+    let playbackStartTime = finishTime - playbackPosition * 1000; // Convert seconds to ms


same as above

additionally I'm confused here as it appears that playbackPosition has already been converted to seconds in L580?

lukasIO · 2026-01-19T09:14:25Z

agents/src/voice/recorder_io/recorder_io.ts

      );
-      // Convert playbackPosition from seconds to milliseconds for wall time calculations
-      const playbackStartTime = finishTime - playbackPosition * 1000 - totalPauseDuration;
+      playbackStartTime = finishTime - playbackPosition * 1000 - totalPauseDuration;


I'll just flag all of these

lukasIO · 2026-01-19T09:17:02Z