-
Notifications
You must be signed in to change notification settings - Fork 212
Refine timestamps in spans and recording alignment #982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 8f38e2c The changes in this PR will be included in the next version bump. This PR includes changesets to release 17 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds explicit span startTime support and OpenTelemetry context propagation across voice workflows; implements event-driven first-frame playback timestamps, silence-padding and timing alignment in recorder IO, tracks Future rejection state, and carries OTEL context through speech handling and span creation. Changes
Sequence DiagramsequenceDiagram
participant VAD as VoiceDetector
participant AA as AgentActivity
participant SH as SpeechHandle
participant AS as AgentSession
participant Gen as Generation
participant AO as AudioOutput
participant TR as Tracer
VAD->>VAD: Detect start_of_speech (has speechDuration)
VAD->>TR: startSpan("user_turn", { startTime: now - speechDuration })
VAD->>AA: onStartOfSpeech(VADEvent)
AA->>SH: set _agentTurnContext = otelContext.active()
AA->>AS: _updateUserState('speaking', speechStartTime)
AS->>TR: startSpan("user_speaking",{ startTime: speechStartTime, context: otelContext })
AA->>Gen: begin generation (carry otelContext)
Gen->>Gen: create firstFrameFut: Future<number>
Gen->>AO: forward audio frames
AO->>AO: on first emitted frame -> emit EVENT_PLAYBACK_STARTED(createdAt)
AO->>Gen: playbackStarted(createdAt)
Gen->>Gen: resolve firstFrameFut with createdAt
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (3)**/*.{ts,tsx,js,jsx}📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
**/*.{ts,tsx}?(test|example|spec)📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
**/*.{ts,tsx}?(test|example)📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Files:
🔇 Additional comments (1)
✏️ Tip: You can disable this entire section by setting Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 markdownlint-cli2 (0.18.1).changeset/lazy-spies-worry.mdmarkdownlint-cli2 v0.18.1 (markdownlint v0.38.0) Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2eb8d02b56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f38e2c44b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /** Whether to end the span when the function exits (default: true) */ | ||
| endOnExit?: boolean; | ||
| /** Optional start time for the span in milliseconds (Date.now() format) */ | ||
| startTime?: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honor startTime in startActiveSpan options
StartSpanOptions now advertises a startTime, but DynamicTracer.startActiveSpan/startActiveSpanSync still build SpanOptions without forwarding that field. As a result, callers who pass startTime to startActiveSpan (e.g., to backdate spans the same way startSpan now does) will silently get spans starting at “now,” which defeats the new timestamp accuracy and causes inconsistent telemetry. Consider passing startTime through when constructing SpanOptions for active spans.
Useful? React with 👍 / 👎.
| !this._padded && | ||
| frames.length > 0 | ||
| ) { | ||
| const padding = (this._startedWallTime - padSince) / 1000; // Convert ms to seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're starting to mix seconds and ms without explicit naming which makes it harder to reason about which one we're dealing with. suggestion: treat ms as the default and use *InS for everything that's converted to seconds
| onPlaybackFinished(options: PlaybackFinishedEvent): void { | ||
| const finishTime = Date.now(); | ||
| const finishTime = this.currentPauseStart ?? Date.now(); | ||
| const trailingSilenceDuration = Math.max(0, Date.now() - finishTime) / 1000; // Convert to seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
|
|
||
| const pauseEvents: Array<[number, number]> = []; | ||
| const pauseEvents: Array<[number, number]> = []; // (position, duration) in seconds | ||
| let playbackStartTime = finishTime - playbackPosition * 1000; // Convert seconds to ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additionally I'm confused here as it appears that playbackPosition has already been converted to seconds in L580?
| ); | ||
| // Convert playbackPosition from seconds to milliseconds for wall time calculations | ||
| const playbackStartTime = finishTime - playbackPosition * 1000 - totalPauseDuration; | ||
| playbackStartTime = finishTime - playbackPosition * 1000 - totalPauseDuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just flag all of these
|
|
||
| const onFirstFrame = () => { | ||
| this.agentSession._updateAgentState('speaking'); | ||
| // Ref: python agent_activity.py - _on_first_frame callback (lines 1651-1668) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: referencing line numbers without a commit hash will quickly be outdated/misleading info
| if (!audioOutput) { | ||
| if (textOut) { | ||
| textOut.firstTextFut.await.finally(onFirstFrame); | ||
| textOut.firstTextFut.await.then(() => onFirstFrame()).catch(() => {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we at least want to log here on rejects?
| audioOut = _audioOut; | ||
| } | ||
| audioOut.firstFrameFut.await.finally(onFirstFrame); | ||
| audioOut.firstFrameFut.await.then((ts) => onFirstFrame(ts)).catch(() => {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question about logging rejections
|
|
||
| const onFirstFrame = () => { | ||
| this.agentSession._updateAgentState('speaking'); | ||
| // Ref: python agent_activity.py - _on_first_frame callback (lines 1935-1952) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same line number nit
| } | ||
| } else { | ||
| textOut?.firstTextFut.await.finally(onFirstFrame); | ||
| textOut?.firstTextFut.await.then(() => onFirstFrame()).catch(() => {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again the logging question for rejections (this and above)
Summary
This PR ports the Python PR #4131 (AGT-2316) to TypeScript, refining timestamp accuracy for telemetry spans and improving recording alignment.
Changes
Telemetry Timestamp Accuracy
speechDurationfrom detection time, rather than recording when VAD triggeredstartTimeparameter support totracer.startSpan()to allow backdating spansRecording Alignment
recorder_io.ts: Added_lastSpeechEndTimeand_lastSpeechStartTimetracking for proper audio alignmenttakeBuf()now supportspadSinceparameter to prepend silence frames when neededEvent Propagation
PlaybackStartedEventinterface andEVENT_PLAYBACK_STARTEDconstant toio.tsParticipantAudioOutputnow emitsplaybackStartedevent when first audio frame is capturedgeneration.tslistens for playback events to resolvefirstFrameFutwith accurate timestampOTel Context Propagation
_agentTurnContexttoSpeechHandleto maintain proper span hierarchyBug Fix: Duplicate Tool Calls
FunctionCallentries in session history by filteringtoolsMessagesto only addFunctionCallOutputitems (sinceFunctionCallitems are already added byonToolExecutionStarted)Utilities
rejectedproperty toFutureclass to check if a future was rejectedFiles Changed
telemetry/traces.tsstartTimetoStartSpanOptions, pass directly to OTel SDKvoice/io.tsPlaybackStartedEvent,EVENT_PLAYBACK_STARTED,onPlaybackStarted()voice/room_io/_output.tsplaybackStartedon first frame capturevoice/generation.tsplaybackStarted, resolvefirstFrameFutwith timestampvoice/audio_recognition.tsspeechDurationvoice/agent_session.tsstartTimeandotelContextto state update methodsvoice/agent_activity.ts_agentTurnContext, fix duplicate tool callsvoice/speech_handle.ts_agentTurnContextpropertyvoice/recorder_io/recorder_io.tsutils.tsrejectedgetter toFutureclassTesting
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.