Skip to content

Conversation

@chenghao-mou
Copy link
Member

@chenghao-mou chenghao-mou commented Jan 23, 2026

This should close #4596

Summary by CodeRabbit

  • New Features

    • Speech recognition usage is now tracked incrementally during streaming with the Google Speech-to-Text plugin, providing real-time consumption metrics per audio stream.
  • Documentation

    • Updated documentation for audio duration field to clarify it represents incremental usage in seconds.

✏️ Tip: You can customize this high-level summary in your review settings.

@chenghao-mou chenghao-mou requested a review from a team January 23, 2026 10:19
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

Adds per-stream usage tracking to the Google STT plugin by introducing helper functions to extract audio duration and request IDs from streaming responses, then emits RECOGNITION_USAGE events. Also documents the audio_duration field in the STT base module.

Changes

Cohort / File(s) Summary
Documentation
livekit-agents/livekit/agents/stt/stt.py
Added docstring to RecognitionUsage.audio_duration field describing incremental audio duration in seconds.
Google STT Usage Tracking
livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py
Introduced _get_audio_duration() helper to compute remaining audible duration from v1/v2 responses; introduced _get_request_id() helper to extract request IDs; integrated per-stream tracking with last_usage_event_time to emit RECOGNITION_USAGE events during streaming processing.

Sequence Diagram(s)

sequenceDiagram
    participant GCP as Google Cloud<br/>Speech API
    participant Plugin as Google STT<br/>Plugin
    participant Framework as Agent<br/>Framework

    GCP->>Plugin: StreamingRecognizeResponse
    Plugin->>Plugin: _get_audio_duration(response,<br/>last_usage_event_time)
    Plugin->>Plugin: _get_request_id(response)
    Plugin->>Plugin: Create SpeechEvent<br/>(type: RECOGNITION_USAGE)
    Plugin->>Framework: Emit SpeechEvent<br/>(duration, request_id)
    Plugin->>Plugin: Update last_usage_event_time
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Our Google STT now counts each spoken word,
Duration tracked, request IDs heard!
Per-stream metrics flow like morning dew,
Usage insights—now we monitor true! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR partially addresses issue #4596. It adds per-stream usage tracking with audio duration calculation and RECOGNITION_USAGE events, but does not fully implement all required STTMetrics fields (duration, request IDs, streaming status) mentioned in the linked issue. Ensure complete STTMetrics implementation including all required fields (duration, audio duration, request IDs, streaming status) as specified in issue #4596 before merging.
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'add STT metrics for google' is concise and accurately summarizes the main change: adding STT metrics support to the Google STT plugin.
Out of Scope Changes check ✅ Passed All changes are directly related to adding STT metrics for the Google plugin. The docstring addition to the base STT module and the new metric tracking in the Google plugin are both in scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py (1)

869-887: Potential edge case: negative duration when speech_event_offset/speech_event_time is zero or unset.

If a response has no speech event (e.g., certain interim results), the offset/time fields may be zero or unset, causing _get_audio_duration to return a negative value (since last_usage_event_time could be non-zero). The > 0 check at the call site handles this, but it means no usage event is emitted for such responses—which may be the intended behavior.

Consider adding a brief inline comment clarifying that negative/zero returns are expected for non-speech-event responses, to aid future maintainers.

📝 Suggested documentation
 def _get_audio_duration(
     resp: cloud_speech_v2.StreamingRecognizeResponse | cloud_speech_v1.StreamingRecognizeResponse,
     last_usage_event_time: float,
 ) -> float:
     """Calculate the audio duration from the response.
 
+    Returns the incremental audio duration since the last usage event.
+    May return zero or negative values for responses without timing data,
+    which should be filtered out by the caller.
+
     References:
         - https://docs.cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.StreamingRecognizeResponse
         - https://docs.cloud.google.com/speech-to-text/docs/reference/rest/v2/StreamingRecognitionResult
     """
📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fe642d and b27c81b.

📒 Files selected for processing (2)
  • livekit-agents/livekit/agents/stt/stt.py
  • livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py
  • livekit-agents/livekit/agents/stt/stt.py
🧠 Learnings (1)
📚 Learning: 2026-01-22T03:28:16.289Z
Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.

Applied to files:

  • livekit-agents/livekit/agents/stt/stt.py
🧬 Code graph analysis (1)
livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py (1)
livekit-agents/livekit/agents/stt/stt.py (3)
  • SpeechEvent (71-75)
  • SpeechEventType (32-49)
  • RecognitionUsage (65-67)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: livekit-plugins-deepgram
  • GitHub Check: livekit-plugins-openai
  • GitHub Check: livekit-plugins-groq
  • GitHub Check: livekit-plugins-inworld
  • GitHub Check: livekit-plugins-cartesia
  • GitHub Check: type-check (3.9)
  • GitHub Check: unit-tests
  • GitHub Check: type-check (3.13)
🔇 Additional comments (3)
livekit-agents/livekit/agents/stt/stt.py (1)

64-68: LGTM!

The added docstring clearly documents the audio_duration field's purpose and units, which aligns with the incremental usage tracking implemented in the Google STT plugin.

livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py (2)

609-609: LGTM! Usage tracking logic is sound.

The incremental tracking via last_usage_event_time correctly accumulates processed audio duration, and the > 0 check avoids emitting spurious zero-duration events. The variable is appropriately scoped to process_stream, resetting on reconnects.

Also applies to: 676-684


890-895: LGTM! Clean version-specific request ID extraction.

Verified that the function correctly handles both v1 and v2 response structures:

  • v1: request_id is int64, safely converted to string
  • v2: request_id accessed via metadata as documented

All lines comply with the 100-character limit, and type hints are Python 3.9+ compatible.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add STTMetrics emission to Google STT plugin

2 participants