Skip to content

Thinking tokens not in traces #4829

@sjhatfield

Description

@sjhatfield

🔴 Required Information

Describe the Bug:

ADK's OpenTelemetry tracing does not export thoughts_token_count to Cloud Trace span attributes. When using Gemini models with ThinkingConfig, the usage_metadata in LlmResponse correctly contains thoughts_token_count (verified via Event.usage_metadata), but this field is never written to the OpenTelemetry span. Only gen_ai.usage.input_tokens (from prompt_token_count) and gen_ai.usage.output_tokens (from candidates_token_count) are exported.

This makes it impossible to monitor or analyze thinking token consumption via Cloud Trace, Cloud Monitoring, or any observability pipeline that relies on span attributes.

The gap is in two locations in google/adk/telemetry/tracing.py:

  1. trace_call_llm() (line ~329-339) — exports gen_ai.usage.input_tokens and gen_ai.usage.output_tokens but not thinking tokens
  2. trace_generate_content_result() (line ~591-599) — same: only GEN_AI_USAGE_INPUT_TOKENS and GEN_AI_USAGE_OUTPUT_TOKENS

Steps to Reproduce:

  1. Install google-adk==1.26.0 and google-genai>=1.65.0
  2. Create an agent with ThinkingConfig(thinking_level="high") and enable Cloud Trace export
  3. Run a query and inspect the resulting Cloud Trace spans
  4. Observe that thoughts_token_count / thinking_token_count is absent from span attributes
  5. For comparison, inspect Event.usage_metadata.thoughts_token_count directly — it is non-zero

Expected Behavior:

When usage_metadata.thoughts_token_count is non-None and non-zero in the model response, ADK should export it as a span attribute (e.g., gen_ai.usage.thinking_tokens or similar) alongside the existing gen_ai.usage.input_tokens and gen_ai.usage.output_tokens.

Observed Behavior:

thoughts_token_count is present in Event.usage_metadata (verified programmatically) but is never exported to OpenTelemetry spans. Cloud Trace shows gen_ai.usage.input_tokens and gen_ai.usage.output_tokens but no thinking token attribute.

Environment Details:

  • ADK Library Version: google-adk==1.26.0
  • Desktop OS: macOS
  • Python Version: 3.11.7

Model Information:

  • Are you using LiteLLM: No
  • Which model is being used: gemini-2.5-flash (also reproducible with other Gemini models that support thinking)

🟡 Optional Information

Regression:

Unknown — this appears to have never been implemented rather than being a regression.

Logs:

# Event.usage_metadata shows thinking tokens correctly:
Event author=weather_agent: prompt=53 candidates=6 thoughts=50 total=109
Event author=weather_agent: prompt=121 candidates=17 thoughts=None total=138

# But Cloud Trace span only has:
#   gen_ai.usage.input_tokens = 53
#   gen_ai.usage.output_tokens = 6
# No thinking_token attribute exists on the span.

Additional Context:

The root cause is visible in google/adk/telemetry/tracing.py. Both trace_call_llm() and trace_generate_content_result() extract and export only two token metrics from usage_metadata:

# In trace_call_llm() (~line 329):
if llm_response.usage_metadata is not None:
    if llm_response.usage_metadata.prompt_token_count is not None:
        span.set_attribute('gen_ai.usage.input_tokens', ...)
    if llm_response.usage_metadata.candidates_token_count is not None:
        span.set_attribute('gen_ai.usage.output_tokens', ...)
    # thoughts_token_count is NOT exported here

A fix would add:

    if llm_response.usage_metadata.thoughts_token_count is not None:
        span.set_attribute('gen_ai.usage.thinking_tokens', 
                           llm_response.usage_metadata.thoughts_token_count)

(The attribute name gen_ai.usage.thinking_tokens is a suggestion — the OpenTelemetry GenAI semantic conventions may not yet define a standard name for this, but a vendor-prefixed alternative like gcp.vertex.agent.usage.thinking_tokens would also work.)

The same addition is needed in trace_generate_content_result() (~line 591).

Minimal Reproduction Code:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "google-adk>=1.26.0",
#     "google-genai>=1.65.0",
# ]
# ///
"""Minimal reproduction: thoughts_token_count present in Event but missing from traces.

Run with:
  export GOOGLE_GENAI_USE_VERTEXAI=true
  export GOOGLE_CLOUD_PROJECT=<your-project>
  uv run repro_thinking_trace.py
"""
import asyncio

from google.adk import Runner
from google.adk.agents import Agent
from google.adk.memory import InMemoryMemoryService
from google.adk.sessions import InMemorySessionService
from google.genai import types


def get_weather(city: str) -> dict:
    """Get current weather for a city."""
    return {"city": city, "temp_f": 72, "condition": "sunny"}


async def main():
    agent = Agent(
        model="gemini-2.5-flash",
        name="weather_agent",
        instruction="You are a weather assistant. Always use the get_weather tool.",
        tools=[get_weather],
        generate_content_config=types.GenerateContentConfig(
            thinking_config=types.ThinkingConfig(thinking_budget=2048),
        ),
        disallow_transfer_to_parent=True,
        disallow_transfer_to_peers=True,
    )

    session_service = InMemorySessionService()
    runner = Runner(
        agent=agent,
        app_name="repro",
        session_service=session_service,
        memory_service=InMemoryMemoryService(),
    )
    session = await session_service.create_session(
        user_id="test", app_name="repro"
    )

    message = types.Content(
        parts=[types.Part(text="What's the weather in San Francisco?")]
    )

    async for event in runner.run_async(
        new_message=message, user_id="test", session_id=session.id
    ):
        usage = getattr(event, "usage_metadata", None)
        if usage is not None:
            thoughts = getattr(usage, "thoughts_token_count", None)
            print(
                f"Event author={event.author}: "
                f"prompt={usage.prompt_token_count} "
                f"candidates={usage.candidates_token_count} "
                f"thoughts={thoughts} "
                f"total={usage.total_token_count}"
            )
            if thoughts is not None and thoughts > 0:
                print(
                    "  ^^^ thoughts_token_count is non-zero in Event, "
                    "but will NOT appear in Cloud Trace span attributes"
                )


if __name__ == "__main__":
    asyncio.run(main())

How often has this issue occurred?:

  • Always (100%) — thoughts_token_count is never exported to spans.

Metadata

Metadata

Labels

tracing[Component] This issue is related to OpenTelemetry tracing

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions