Skip to content

feat(runtime): expose per-turn usage and cost in the after_llm_call hook payload#2994

Draft
kimizuka wants to merge 4 commits into
docker:mainfrom
kimizuka:feat/after-llm-call-usage-cost
Draft

feat(runtime): expose per-turn usage and cost in the after_llm_call hook payload#2994
kimizuka wants to merge 4 commits into
docker:mainfrom
kimizuka:feat/after-llm-call-usage-cost

Conversation

@kimizuka
Copy link
Copy Markdown
Contributor

@kimizuka kimizuka commented Jun 3, 2026

Summary

Adds per-turn token usage and computed USD cost to the after_llm_call hook payload (hooks.Input), so a sidecar cost ledger can record per-call spend from the hook payload alone — without subscribing to the runtime event channel. This is the primitive-first half of #2948.

model_id was already populated by #2911, so the remaining scope here is just usage and cost.

This implements what was discussed in #2948. The one design decision worth a second look is the cost JSON encoding, covered under Wire contract below.

Closes #2948

Wire contract

hooks.Input is the struct shared by every hook event and serialized to JSON for handlers, so the additions are deliberately conservative:

Usage *chat.Usage `json:"usage,omitempty"`
Cost  *float64    `json:"cost,omitempty"`

For a native model call, cost has three meaningful states:

Go value JSON meaning
nil key absent unpriced — no pricing table, or no usage reported
&0 "cost": 0 a priced call that was genuinely free
&N "cost": N the priced USD cost of the response

omitempty on a pointer drops only nil, never a pointer to 0, so a free call still emits an explicit "cost": 0. A present cost is therefore authoritative and an absent one means "unpriced", with no need to cross-check usage. Both fields are populated only for after_llm_call; they are nil (and thus absent) on every other event, so no other event's payload changes.

Note on the schema (deviation from my #2948 comment)

In #2948 I suggested json:"cost" (no omitempty, explicit cost: null for unpriced). I switched to omitempty here because hooks.Input is shared by all events: without omitempty, every non-after_llm_call event (before_llm_call, session_end, …) would start emitting "cost": null, which both pollutes unrelated payloads and breaks the struct's all-omitempty convention. The pointer + omitempty form keeps the same three-way distinction within after_llm_call (absent / 0 / N) while leaving other events untouched. An explicit null instead of omitempty is a one-line change if the team prefers it.

Cost is computed once and equals the session's recorded cost

The per-turn cost is computed once in runTurn via a new computeMessageCost(usage, m) *float64 helper and threaded into both the hook payload and recordAssistantMessage. The previous inline arithmetic in recordAssistantMessage is replaced by this single source (the m *modelsdev.Model param becomes the precomputed cost *float64), so the cost a handler sees is exactly the cost the session bills for the turn. The persisted message cost is unchanged (nil records as 0, matching prior behavior).

Harness agents

For harness agents, cost is the harness's own reported total rather than a computed price. The harness library defaults the total to 0 when the harness output omits a cost (e.g. the codex harness reports token counts but no cost), which is indistinguishable from a genuinely free call — so to avoid telling a ledger that a billed turn was free, cost is surfaced only when the harness reported a non-zero value (otherwise it is nil/unpriced).

Sub-sessions

after_llm_call fires for every model call, including those inside sub-sessions (transferred tasks, background agents, skills), each with the sub-session's own session_id. Summing cost across after_llm_call events therefore captures all spend — including sub-sessions whose cost may never reach the session store — which is the motivating case in #2948. The change does not touch sub-session persistence in any way.

Example: a cost-ledger sidecar

A command hook can append one CSV row per model call straight from the payload — no event-channel subscription needed. has("cost") distinguishes an unpriced call (key absent) from a priced free one (cost: 0):

after_llm_call:
  - type: command
    command: |
      cat | jq -r '[
        (now | todateiso8601), .session_id, .model_id,
        (.usage.input_tokens  // 0), (.usage.output_tokens // 0),
        (if has("cost") then (.cost | tostring) else "unpriced" end)
      ] | @csv' >> /tmp/cost-ledger.csv

A runnable version is wired into the canonical examples/hooks.yaml. Because after_llm_call fires for sub-session turns too (each with its own session_id), summing the cost column is the full spend for the run.

Out of scope (follow-ups)

  • Fallback-model pricing: on a turn that fell back to a secondary model, cost/model_id reflect the primary model. This is pre-existing behavior in recordAssistantMessage and is preserved here (hook cost == recorded cost); attributing cost to the model that actually ran is a separate change.
  • Coverage of compaction sub-runtimes and the chatserver / a2a / acp paths.

Testing

pkg/runtime/after_llm_call_test.go:

  • priced call → usage + non-nil cost, and *cost == sess.OwnCost()
  • unpriced model → usage present, cost nil
  • JSON wire contract (absent / explicit 0 / N)
  • harness path (codex) → usage present, cost nil
  • computeMessageCost unit tests (every nil branch + all token classes)

docs/configuration/hooks/index.md updated for the new fields and the sub-session / harness caveats; examples/hooks.yaml demonstrates a cost-ledger sidecar consuming the payload.

Signed off per DCO.

kimizuka added 3 commits June 3, 2026 23:45
Forward the per-call token usage and computed USD cost to the
after_llm_call hook payload so sidecar cost ledgers can record
per-call spend from the payload alone, without subscribing to the
runtime event channel.

Cost is a *float64 so the wire contract can distinguish an unpriced
model (nil, key absent) from a priced free call (pointer to 0). The
per-turn cost is computed once in computeMessageCost and threaded into
both the hook payload and the recorded assistant message, so the two
can never disagree. For harness agents the cost is surfaced only when
the harness reported a non-zero value, avoiding reporting a billed
turn as free when a harness omits its cost (e.g. codex).

Signed-off-by: kimizuka <f.kimizuka@gmail.com>
Verify that after_llm_call populates usage and cost, that cost is nil
when the model is unpriced, the nil-vs-zero JSON contract, harness
usage with no cost surfacing as unpriced, and computeMessageCost.

Signed-off-by: kimizuka <f.kimizuka@gmail.com>
Describe the new usage and cost fields, the priced/unpriced/free
semantics and the harness caveat, and add a per-call cost-ledger
example to examples/hooks.yaml.

Signed-off-by: kimizuka <f.kimizuka@gmail.com>
@kimizuka kimizuka force-pushed the feat/after-llm-call-usage-cost branch from 0d587c8 to 8a4eb0b Compare June 3, 2026 14:45
@aheritier aheritier added area/agent For work that has to do with the general agent loop/agentic features of the app kind/feat PR adds a new feature (maps to feat: commit prefix) labels Jun 3, 2026
@dgageot dgageot marked this pull request as ready for review June 3, 2026 15:42
@dgageot dgageot requested a review from a team as a code owner June 3, 2026 15:42
@dgageot dgageot marked this pull request as draft June 3, 2026 15:42
Add the empty line embeddedstructfieldcheck wants between the embedded
ModelStore and the cost field, and switch the float equality assertions
to assert.InDelta to satisfy testifylint's float-compare rule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent For work that has to do with the general agent loop/agentic features of the app kind/feat PR adds a new feature (maps to feat: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose per-turn token usage and cost in the after_llm_call hook payload

2 participants