docker · kimizuka · Jun 3, 2026 · Jun 3, 2026 · Jun 3, 2026 · Jun 4, 2026
@@ -259,7 +259,7 @@ In addition to the common fields, each event ships its own payload:
 | `turn_start`                | _none_ (just the common fields)                                                                                       |
 | `turn_end`                  | `agent_name`, `reason` — one of `normal`, `continue`, `steered`, `error`, `canceled`, `hook_blocked`, `loop_detected` |
 | `before_llm_call`           | `iteration` — 1-based run-loop iteration counter (the model call this hook is gating), `model_id`                    |
-| `after_llm_call`            | `agent_name`, `stop_response`, `last_user_message`, `model_id`                                                       |
+| `after_llm_call`            | `agent_name`, `stop_response`, `last_user_message`, `model_id`, `usage`, `cost`                                       |
 | `session_end`               | `reason` — one of `clear`, `logout`, `prompt_input_exit`, `other`                                                     |
 | `pre_compact`               | `source` — one of `manual`, `auto`, `overflow`, `tool_overflow`                                                       |
 | `before_compaction`         | `input_tokens`, `output_tokens`, `context_limit`, `compaction_reason` (one of `threshold`/`overflow`/`manual`)        |
@@ -281,6 +281,9 @@ Notes:
 - `prompt` is only populated for `user_prompt_submit`. Sub-sessions (transferred tasks, background agents, skills) do **not** fire this event because their kick-off message is synthesised by the runtime, not authored by the user.
 - `stop_response` carries the model's final assistant text for `stop`, `after_llm_call`, and `subagent_stop`. `last_user_message` carries the latest user message at dispatch time.
 - `model_id` is populated for `after_llm_call` (and `before_llm_call`) in the canonical `<provider>/<model>` form (e.g. `anthropic/claude-sonnet-4-5`). For harness agents, `model_id` is the harness label (e.g. `claude-code`) rather than a canonical model name — see [Coding Harnesses]({{ '/features/harnesses/' | relative_url }}).
+- `usage` and `cost` are populated for `after_llm_call` only. `usage` is the per-call token usage object (`input_tokens`, `output_tokens`, `cached_input_tokens`, `cached_write_tokens`, and `reasoning_tokens` — the last is itself omitted for non-reasoning models); the whole object is absent when the provider reported no usage. `cost` is the USD price of that one model response. For a **native model call** it is the price computed from `usage` and the model's pricing table, and equals the cost the session records for the turn: it is **absent** when the response is unpriced (no pricing data on file, or no usage) and an explicit `0` for a priced call that was free — so a present `cost` is authoritative and an absent one means "unpriced", with no need to cross-check `usage`. (For harness agents the meaning differs — see the next note.) A cost ledger can therefore record per-call spend from the payload alone, without subscribing to the runtime event channel.
+- For [harness agents]({{ '/features/harnesses/' | relative_url }}), `cost` is the harness's own reported total for the call rather than a computed price, and is present only when the harness reported a non-zero cost (some harnesses, e.g. `codex`, report token counts but no cost — those turns carry `usage` with `cost` absent, even though the recorded message stores `0`).
+- `after_llm_call` fires for **every** model call, including calls made inside sub-sessions (transferred tasks, background agents, skills). For those, `session_id` is the sub-session's id. Summing `cost` across `after_llm_call` events therefore captures **all** spend, including sub-sessions (and even sub-sessions that error before their cost is persisted). Do **not** add a separately-queried session cost total on top: the runtime's own total already recurses into and includes completed sub-session spend, so combining the two double-counts. Pick one source — the summed hook costs — as the authoritative ledger.
 - `context_limit` is `0` when the model definition is unavailable (treat `0` as "unknown", not as a real limit).
 - `approval_decision` is one of `allow`, `deny`, `canceled`. `approval_source` is a stable classifier of which step decided (e.g. `yolo`, `session_permissions_allow`, `session_permissions_deny`, `team_permissions_allow`, `team_permissions_deny`, `pre_tool_use_hook_allow`, `pre_tool_use_hook_deny`, `readonly_hint`, `user_approved`, `user_approved_session`, `user_approved_tool`, `user_rejected`, `context_canceled`).
 
@@ -552,7 +555,7 @@ The `reason` field classifies the exit:
 
 `before_llm_call` fires immediately before every model call (after `turn_start` has assembled the messages). It cannot contribute context — use `turn_start` for that — but it can **stop the run** by returning `decision: block` (or exit code 2). The built-in `max_iterations` hook implements a hard cap on top of this event.
 
-`after_llm_call` fires immediately after each successful model call, before the response is recorded into the session and tool calls are dispatched. The assistant text is in `stop_response`. Use it for response auditing, redaction logging, or quality metrics. Failed model calls fire `on_error` instead.
+`after_llm_call` fires immediately after each successful model call, before the response is recorded into the session and tool calls are dispatched. The assistant text is in `stop_response`, and the call's `usage` and `cost` carry the per-turn token usage and computed USD spend (see the field notes above). Use it for response auditing, redaction logging, quality metrics, or a sidecar cost ledger that records per-call spend without subscribing to the runtime event channel. Failed model calls fire `on_error` instead.
 
 ### Before/After-Compaction: structured compaction control
 

@@ -65,6 +65,7 @@
 #   /tmp/agent-session.log         (session_start, session_end)
 #   /tmp/agent-prompts.log         (user_prompt_submit)
 #   /tmp/agent-llm-calls.log       (before_llm_call, after_llm_call)
+#   /tmp/agent-cost-ledger.csv     (after_llm_call: per-call token usage + cost)
 #   /tmp/agent-turns.log           (turn_end)
 #   /tmp/agent-tool-results.log    (post_tool_use)
 #   /tmp/agent-permissions.log     (permission_request)
@@ -277,6 +278,14 @@ agents:
       # assistant text content arrives via stop_response (matching the
       # stop event's payload). Failed calls fire on_error instead and
       # skip this event.
+      #
+      # The payload also carries this call's token usage in .usage and its
+      # computed USD cost in .cost. .cost is ABSENT for an unpriced model
+      # (test with `has("cost")`) and an explicit 0 for a priced free call,
+      # so a present cost is authoritative without checking usage. That is
+      # everything a sidecar cost ledger needs — no event-channel wiring.
+      # after_llm_call also fires for sub-session turns (each with its own
+      # session_id), so summing .cost is the full spend for the run.
       # ====================================================================
       after_llm_call:
         - type: command
@@ -286,6 +295,12 @@ agents:
             SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "unknown"')
             LEN=$(echo "$INPUT" | jq -r '.stop_response // ""' | wc -c | tr -d ' ')
             echo "[$(date)] [←] $SESSION_ID llm call complete, content=$LEN chars" >> /tmp/agent-llm-calls.log
+            # Per-call cost ledger: timestamp, session, model, tokens, cost.
+            echo "$INPUT" | jq -r '[
+              (now | todateiso8601), .session_id, .model_id,
+              (.usage.input_tokens // 0), (.usage.output_tokens // 0),
+              (if has("cost") then (.cost | tostring) else "unpriced" end)
+            ] | @csv' >> /tmp/agent-cost-ledger.csv
 
       # ====================================================================
       # SESSION-END - cleanup when the session terminates.

@@ -68,6 +68,12 @@ const (
 	EventBeforeLLMCall EventType = "before_llm_call"
 	// EventAfterLLMCall fires immediately after a successful model call,
 	// before the response is recorded. Failed calls fire EventOnError.
+	// The Input carries the response text in [Input.StopResponse]
+	// (matching the stop event), the model that produced it in
+	// [Input.ModelID], and per-turn billing data in [Input.Usage] and
+	// [Input.Cost] so sidecar cost ledgers can record per-call spend
+	// from the payload alone, without subscribing to the runtime event
+	// channel.
 	EventAfterLLMCall EventType = "after_llm_call"
 	// EventSessionEnd fires when a session terminates.
 	EventSessionEnd EventType = "session_end"
@@ -293,6 +299,36 @@ type Input struct {
 	ApprovalDecision string `json:"approval_decision,omitempty"`
 	ApprovalSource   string `json:"approval_source,omitempty"`
 
+	// AfterLLMCall specific: per-turn token usage and the computed USD
+	// cost of the model response the runtime just received. Both are
+	// populated only for [EventAfterLLMCall] and are nil for every
+	// other event. They are the hook-side counterpart of the runtime's
+	// internal TokenUsageEvent and let sidecar cost ledgers record
+	// per-call spend from the payload alone.
+	//
+	// Usage is a pointer so a handler can distinguish "the provider
+	// reported no usage" (nil) from "usage was zero".
+	//
+	// Cost is a *float64 with three meaningful states, mirroring the
+	// runtime's own pricing gate (usage present AND a model definition
+	// with a pricing table):
+	//   - nil   → unpriced: the model has no pricing data on file
+	//             (unknown model ID, custom endpoint without cost
+	//             config) or the provider reported no usage. With
+	//             omitempty the "cost" key is absent on the wire.
+	//   - 0     → a priced model whose computed cost is genuinely zero
+	//             (a free call). Emitted as "cost": 0, NOT elided —
+	//             omitempty on a pointer drops only nil, never a
+	//             non-nil pointer to the zero value.
+	//   - non-0 → the priced USD cost of this single response.
+	// A handler therefore reads a present "cost" as authoritative and
+	// an absent one as "unpriced", with no need to cross-check usage.
+	// (This is deliberately a *float64, unlike [chat.Message.Cost],
+	// which is a plain float64 with omitempty and so cannot distinguish
+	// a free priced call from an unpriced one on the wire.)
+	Usage *chat.Usage `json:"usage,omitempty"`
+	Cost  *float64    `json:"cost,omitempty"`
+
 	// Compaction fields (BeforeCompaction, AfterCompaction).
 	InputTokens  int64 `json:"input_tokens,omitempty"`
 	OutputTokens int64 `json:"output_tokens,omitempty"`