-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
🔴 Required Information
Describe the Bug:
ADK's OpenTelemetry tracing does not export thoughts_token_count to Cloud Trace span attributes. When using Gemini models with ThinkingConfig, the usage_metadata in LlmResponse correctly contains thoughts_token_count (verified via Event.usage_metadata), but this field is never written to the OpenTelemetry span. Only gen_ai.usage.input_tokens (from prompt_token_count) and gen_ai.usage.output_tokens (from candidates_token_count) are exported.
This makes it impossible to monitor or analyze thinking token consumption via Cloud Trace, Cloud Monitoring, or any observability pipeline that relies on span attributes.
The gap is in two locations in google/adk/telemetry/tracing.py:
trace_call_llm()(line ~329-339) — exportsgen_ai.usage.input_tokensandgen_ai.usage.output_tokensbut not thinking tokenstrace_generate_content_result()(line ~591-599) — same: onlyGEN_AI_USAGE_INPUT_TOKENSandGEN_AI_USAGE_OUTPUT_TOKENS
Steps to Reproduce:
- Install
google-adk==1.26.0andgoogle-genai>=1.65.0 - Create an agent with
ThinkingConfig(thinking_level="high")and enable Cloud Trace export - Run a query and inspect the resulting Cloud Trace spans
- Observe that
thoughts_token_count/thinking_token_countis absent from span attributes - For comparison, inspect
Event.usage_metadata.thoughts_token_countdirectly — it is non-zero
Expected Behavior:
When usage_metadata.thoughts_token_count is non-None and non-zero in the model response, ADK should export it as a span attribute (e.g., gen_ai.usage.thinking_tokens or similar) alongside the existing gen_ai.usage.input_tokens and gen_ai.usage.output_tokens.
Observed Behavior:
thoughts_token_count is present in Event.usage_metadata (verified programmatically) but is never exported to OpenTelemetry spans. Cloud Trace shows gen_ai.usage.input_tokens and gen_ai.usage.output_tokens but no thinking token attribute.
Environment Details:
- ADK Library Version:
google-adk==1.26.0 - Desktop OS: macOS
- Python Version: 3.11.7
Model Information:
- Are you using LiteLLM: No
- Which model is being used:
gemini-2.5-flash(also reproducible with other Gemini models that support thinking)
🟡 Optional Information
Regression:
Unknown — this appears to have never been implemented rather than being a regression.
Logs:
# Event.usage_metadata shows thinking tokens correctly:
Event author=weather_agent: prompt=53 candidates=6 thoughts=50 total=109
Event author=weather_agent: prompt=121 candidates=17 thoughts=None total=138
# But Cloud Trace span only has:
# gen_ai.usage.input_tokens = 53
# gen_ai.usage.output_tokens = 6
# No thinking_token attribute exists on the span.
Additional Context:
The root cause is visible in google/adk/telemetry/tracing.py. Both trace_call_llm() and trace_generate_content_result() extract and export only two token metrics from usage_metadata:
# In trace_call_llm() (~line 329):
if llm_response.usage_metadata is not None:
if llm_response.usage_metadata.prompt_token_count is not None:
span.set_attribute('gen_ai.usage.input_tokens', ...)
if llm_response.usage_metadata.candidates_token_count is not None:
span.set_attribute('gen_ai.usage.output_tokens', ...)
# thoughts_token_count is NOT exported hereA fix would add:
if llm_response.usage_metadata.thoughts_token_count is not None:
span.set_attribute('gen_ai.usage.thinking_tokens',
llm_response.usage_metadata.thoughts_token_count)(The attribute name gen_ai.usage.thinking_tokens is a suggestion — the OpenTelemetry GenAI semantic conventions may not yet define a standard name for this, but a vendor-prefixed alternative like gcp.vertex.agent.usage.thinking_tokens would also work.)
The same addition is needed in trace_generate_content_result() (~line 591).
Minimal Reproduction Code:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "google-adk>=1.26.0",
# "google-genai>=1.65.0",
# ]
# ///
"""Minimal reproduction: thoughts_token_count present in Event but missing from traces.
Run with:
export GOOGLE_GENAI_USE_VERTEXAI=true
export GOOGLE_CLOUD_PROJECT=<your-project>
uv run repro_thinking_trace.py
"""
import asyncio
from google.adk import Runner
from google.adk.agents import Agent
from google.adk.memory import InMemoryMemoryService
from google.adk.sessions import InMemorySessionService
from google.genai import types
def get_weather(city: str) -> dict:
"""Get current weather for a city."""
return {"city": city, "temp_f": 72, "condition": "sunny"}
async def main():
agent = Agent(
model="gemini-2.5-flash",
name="weather_agent",
instruction="You are a weather assistant. Always use the get_weather tool.",
tools=[get_weather],
generate_content_config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_budget=2048),
),
disallow_transfer_to_parent=True,
disallow_transfer_to_peers=True,
)
session_service = InMemorySessionService()
runner = Runner(
agent=agent,
app_name="repro",
session_service=session_service,
memory_service=InMemoryMemoryService(),
)
session = await session_service.create_session(
user_id="test", app_name="repro"
)
message = types.Content(
parts=[types.Part(text="What's the weather in San Francisco?")]
)
async for event in runner.run_async(
new_message=message, user_id="test", session_id=session.id
):
usage = getattr(event, "usage_metadata", None)
if usage is not None:
thoughts = getattr(usage, "thoughts_token_count", None)
print(
f"Event author={event.author}: "
f"prompt={usage.prompt_token_count} "
f"candidates={usage.candidates_token_count} "
f"thoughts={thoughts} "
f"total={usage.total_token_count}"
)
if thoughts is not None and thoughts > 0:
print(
" ^^^ thoughts_token_count is non-zero in Event, "
"but will NOT appear in Cloud Trace span attributes"
)
if __name__ == "__main__":
asyncio.run(main())How often has this issue occurred?:
- Always (100%) —
thoughts_token_countis never exported to spans.