add STT metrics for google #4599

chenghao-mou · 2026-01-23T10:19:45Z

This should close #4596

Summary by CodeRabbit

New Features
- Speech recognition usage is now tracked incrementally during streaming with the Google Speech-to-Text plugin, providing real-time consumption metrics per audio stream.
Documentation
- Updated documentation for audio duration field to clarify it represents incremental usage in seconds.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-23T10:20:09Z

📝 Walkthrough

Walkthrough

Adds per-stream usage tracking to the Google STT plugin by introducing helper functions to extract audio duration and request IDs from streaming responses, then emits RECOGNITION_USAGE events. Also documents the audio_duration field in the STT base module.

Changes

Cohort / File(s)	Summary
Documentation `livekit-agents/livekit/agents/stt/stt.py`	Added docstring to `RecognitionUsage.audio_duration` field describing incremental audio duration in seconds.
Google STT Usage Tracking `livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py`	Introduced `_get_audio_duration()` helper to compute remaining audible duration from v1/v2 responses; introduced `_get_request_id()` helper to extract request IDs; integrated per-stream tracking with `last_usage_event_time` to emit `RECOGNITION_USAGE` events during streaming processing.

Sequence Diagram(s)

sequenceDiagram
    participant GCP as Google Cloud<br/>Speech API
    participant Plugin as Google STT<br/>Plugin
    participant Framework as Agent<br/>Framework

    GCP->>Plugin: StreamingRecognizeResponse
    Plugin->>Plugin: _get_audio_duration(response,<br/>last_usage_event_time)
    Plugin->>Plugin: _get_request_id(response)
    Plugin->>Plugin: Create SpeechEvent<br/>(type: RECOGNITION_USAGE)
    Plugin->>Framework: Emit SpeechEvent<br/>(duration, request_id)
    Plugin->>Plugin: Update last_usage_event_time

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Our Google STT now counts each spoken word,
Duration tracked, request IDs heard!
Per-stream metrics flow like morning dew,
Usage insights—now we monitor true! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR partially addresses issue `#4596`. It adds per-stream usage tracking with audio duration calculation and RECOGNITION_USAGE events, but does not fully implement all required STTMetrics fields (duration, request IDs, streaming status) mentioned in the linked issue.	Ensure complete STTMetrics implementation including all required fields (duration, audio duration, request IDs, streaming status) as specified in issue `#4596` before merging.
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'add STT metrics for google' is concise and accurately summarizes the main change: adding STT metrics support to the Google STT plugin.
Out of Scope Changes check	✅ Passed	All changes are directly related to adding STT metrics for the Google plugin. The docstring addition to the base STT module and the new metric tracking in the Google plugin are both in scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py (1)
869-887: Potential edge case: negative duration when speech_event_offset/speech_event_time is zero or unset.

If a response has no speech event (e.g., certain interim results), the offset/time fields may be zero or unset, causing _get_audio_duration to return a negative value (since last_usage_event_time could be non-zero). The > 0 check at the call site handles this, but it means no usage event is emitted for such responses—which may be the intended behavior.

Consider adding a brief inline comment clarifying that negative/zero returns are expected for non-speech-event responses, to aid future maintainers.
📝 Suggested documentation
 def _get_audio_duration(
     resp: cloud_speech_v2.StreamingRecognizeResponse | cloud_speech_v1.StreamingRecognizeResponse,
     last_usage_event_time: float,
 ) -> float:
     """Calculate the audio duration from the response.
 
+    Returns the incremental audio duration since the last usage event.
+    May return zero or negative values for responses without timing data,
+    which should be filtered out by the caller.
+
     References:
         - https://docs.cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.StreamingRecognizeResponse
         - https://docs.cloud.google.com/speech-to-text/docs/reference/rest/v2/StreamingRecognitionResult
     """

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fe642d and b27c81b.

📒 Files selected for processing (2)

livekit-agents/livekit/agents/stt/stt.py
livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py
livekit-agents/livekit/agents/stt/stt.py

🧠 Learnings (1)

📚 Learning: 2026-01-22T03:28:16.289Z

Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.

Applied to files:

livekit-agents/livekit/agents/stt/stt.py

🧬 Code graph analysis (1)

livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py (1)

livekit-agents/livekit/agents/stt/stt.py (3)

SpeechEvent (71-75)

SpeechEventType (32-49)

RecognitionUsage (65-67)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: livekit-plugins-deepgram
GitHub Check: livekit-plugins-openai
GitHub Check: livekit-plugins-groq
GitHub Check: livekit-plugins-inworld
GitHub Check: livekit-plugins-cartesia
GitHub Check: type-check (3.9)
GitHub Check: unit-tests
GitHub Check: type-check (3.13)

🔇 Additional comments (3)

livekit-agents/livekit/agents/stt/stt.py (1)

64-68: LGTM!

The added docstring clearly documents the audio_duration field's purpose and units, which aligns with the incremental usage tracking implemented in the Google STT plugin.

livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py (2)

609-609: LGTM! Usage tracking logic is sound.

The incremental tracking via last_usage_event_time correctly accumulates processed audio duration, and the > 0 check avoids emitting spurious zero-duration events. The variable is appropriately scoped to process_stream, resetting on reconnects.

Also applies to: 676-684

890-895: LGTM! Clean version-specific request ID extraction.

Verified that the function correctly handles both v1 and v2 response structures:

v1: request_id is int64, safely converted to string

v2: request_id accessed via metadata as documented

All lines comply with the 100-character limit, and type hints are Python 3.9+ compatible.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

add stt metrics for google

b27c81b

chenghao-mou requested a review from a team January 23, 2026 10:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add STT metrics for google #4599

add STT metrics for google #4599

chenghao-mou commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add STT metrics for google #4599

Are you sure you want to change the base?

add STT metrics for google #4599

Conversation

chenghao-mou commented Jan 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenghao-mou commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 23, 2026 •

edited

Loading