Thinking tokens not in traces

## 🔴 Required Information

**Describe the Bug:**

ADK's OpenTelemetry tracing does not export `thoughts_token_count` to Cloud Trace span attributes. When using Gemini models with `ThinkingConfig`, the `usage_metadata` in `LlmResponse` correctly contains `thoughts_token_count` (verified via `Event.usage_metadata`), but this field is never written to the OpenTelemetry span. Only `gen_ai.usage.input_tokens` (from `prompt_token_count`) and `gen_ai.usage.output_tokens` (from `candidates_token_count`) are exported.

This makes it impossible to monitor or analyze thinking token consumption via Cloud Trace, Cloud Monitoring, or any observability pipeline that relies on span attributes.

The gap is in two locations in `google/adk/telemetry/tracing.py`:

1. `trace_call_llm()` (line ~329-339) — exports `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens` but not thinking tokens
2. `trace_generate_content_result()` (line ~591-599) — same: only `GEN_AI_USAGE_INPUT_TOKENS` and `GEN_AI_USAGE_OUTPUT_TOKENS`

**Steps to Reproduce:**

1. Install `google-adk==1.26.0` and `google-genai>=1.65.0`
2. Create an agent with `ThinkingConfig(thinking_level="high")` and enable Cloud Trace export
3. Run a query and inspect the resulting Cloud Trace spans
4. Observe that `thoughts_token_count` / `thinking_token_count` is absent from span attributes
5. For comparison, inspect `Event.usage_metadata.thoughts_token_count` directly — it is non-zero

**Expected Behavior:**

When `usage_metadata.thoughts_token_count` is non-None and non-zero in the model response, ADK should export it as a span attribute (e.g., `gen_ai.usage.thinking_tokens` or similar) alongside the existing `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens`.

**Observed Behavior:**

`thoughts_token_count` is present in `Event.usage_metadata` (verified programmatically) but is **never exported** to OpenTelemetry spans. Cloud Trace shows `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens` but no thinking token attribute.

**Environment Details:**

- ADK Library Version: `google-adk==1.26.0`
- Desktop OS: macOS
- Python Version: 3.11.7

**Model Information:**

- Are you using LiteLLM: No
- Which model is being used: `gemini-2.5-flash` (also reproducible with other Gemini models that support thinking)

---

## 🟡 Optional Information

**Regression:**

Unknown — this appears to have never been implemented rather than being a regression.

**Logs:**

```text
# Event.usage_metadata shows thinking tokens correctly:
Event author=weather_agent: prompt=53 candidates=6 thoughts=50 total=109
Event author=weather_agent: prompt=121 candidates=17 thoughts=None total=138

# But Cloud Trace span only has:
#   gen_ai.usage.input_tokens = 53
#   gen_ai.usage.output_tokens = 6
# No thinking_token attribute exists on the span.
```

**Additional Context:**

The root cause is visible in `google/adk/telemetry/tracing.py`. Both `trace_call_llm()` and `trace_generate_content_result()` extract and export only two token metrics from `usage_metadata`:

```python
# In trace_call_llm() (~line 329):
if llm_response.usage_metadata is not None:
    if llm_response.usage_metadata.prompt_token_count is not None:
        span.set_attribute('gen_ai.usage.input_tokens', ...)
    if llm_response.usage_metadata.candidates_token_count is not None:
        span.set_attribute('gen_ai.usage.output_tokens', ...)
    # thoughts_token_count is NOT exported here
```

A fix would add:
```python
    if llm_response.usage_metadata.thoughts_token_count is not None:
        span.set_attribute('gen_ai.usage.thinking_tokens', 
                           llm_response.usage_metadata.thoughts_token_count)
```

(The attribute name `gen_ai.usage.thinking_tokens` is a suggestion — the OpenTelemetry GenAI semantic conventions may not yet define a standard name for this, but a vendor-prefixed alternative like `gcp.vertex.agent.usage.thinking_tokens` would also work.)

The same addition is needed in `trace_generate_content_result()` (~line 591).

**Minimal Reproduction Code:**

```python
# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "google-adk>=1.26.0",
#     "google-genai>=1.65.0",
# ]
# ///
"""Minimal reproduction: thoughts_token_count present in Event but missing from traces.

Run with:
  export GOOGLE_GENAI_USE_VERTEXAI=true
  export GOOGLE_CLOUD_PROJECT=<your-project>
  uv run repro_thinking_trace.py
"""
import asyncio

from google.adk import Runner
from google.adk.agents import Agent
from google.adk.memory import InMemoryMemoryService
from google.adk.sessions import InMemorySessionService
from google.genai import types


def get_weather(city: str) -> dict:
    """Get current weather for a city."""
    return {"city": city, "temp_f": 72, "condition": "sunny"}


async def main():
    agent = Agent(
        model="gemini-2.5-flash",
        name="weather_agent",
        instruction="You are a weather assistant. Always use the get_weather tool.",
        tools=[get_weather],
        generate_content_config=types.GenerateContentConfig(
            thinking_config=types.ThinkingConfig(thinking_budget=2048),
        ),
        disallow_transfer_to_parent=True,
        disallow_transfer_to_peers=True,
    )

    session_service = InMemorySessionService()
    runner = Runner(
        agent=agent,
        app_name="repro",
        session_service=session_service,
        memory_service=InMemoryMemoryService(),
    )
    session = await session_service.create_session(
        user_id="test", app_name="repro"
    )

    message = types.Content(
        parts=[types.Part(text="What's the weather in San Francisco?")]
    )

    async for event in runner.run_async(
        new_message=message, user_id="test", session_id=session.id
    ):
        usage = getattr(event, "usage_metadata", None)
        if usage is not None:
            thoughts = getattr(usage, "thoughts_token_count", None)
            print(
                f"Event author={event.author}: "
                f"prompt={usage.prompt_token_count} "
                f"candidates={usage.candidates_token_count} "
                f"thoughts={thoughts} "
                f"total={usage.total_token_count}"
            )
            if thoughts is not None and thoughts > 0:
                print(
                    "  ^^^ thoughts_token_count is non-zero in Event, "
                    "but will NOT appear in Cloud Trace span attributes"
                )


if __name__ == "__main__":
    asyncio.run(main())
```

**How often has this issue occurred?:**

- Always (100%) — `thoughts_token_count` is never exported to spans.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thinking tokens not in traces #4829

🔴 Required Information

🟡 Optional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Thinking tokens not in traces #4829

Description

🔴 Required Information

🟡 Optional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions