Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions docs/configuration/hooks/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ In addition to the common fields, each event ships its own payload:
| `turn_start` | _none_ (just the common fields) |
| `turn_end` | `agent_name`, `reason` — one of `normal`, `continue`, `steered`, `error`, `canceled`, `hook_blocked`, `loop_detected` |
| `before_llm_call` | `iteration` — 1-based run-loop iteration counter (the model call this hook is gating), `model_id` |
| `after_llm_call` | `agent_name`, `stop_response`, `last_user_message`, `model_id` |
| `after_llm_call` | `agent_name`, `stop_response`, `last_user_message`, `model_id`, `usage`, `cost` |
| `session_end` | `reason` — one of `clear`, `logout`, `prompt_input_exit`, `other` |
| `pre_compact` | `source` — one of `manual`, `auto`, `overflow`, `tool_overflow` |
| `before_compaction` | `input_tokens`, `output_tokens`, `context_limit`, `compaction_reason` (one of `threshold`/`overflow`/`manual`) |
Expand All @@ -281,6 +281,9 @@ Notes:
- `prompt` is only populated for `user_prompt_submit`. Sub-sessions (transferred tasks, background agents, skills) do **not** fire this event because their kick-off message is synthesised by the runtime, not authored by the user.
- `stop_response` carries the model's final assistant text for `stop`, `after_llm_call`, and `subagent_stop`. `last_user_message` carries the latest user message at dispatch time.
- `model_id` is populated for `after_llm_call` (and `before_llm_call`) in the canonical `<provider>/<model>` form (e.g. `anthropic/claude-sonnet-4-5`). For harness agents, `model_id` is the harness label (e.g. `claude-code`) rather than a canonical model name — see [Coding Harnesses]({{ '/features/harnesses/' | relative_url }}).
- `usage` and `cost` are populated for `after_llm_call` only. `usage` is the per-call token usage object (`input_tokens`, `output_tokens`, `cached_input_tokens`, `cached_write_tokens`, and `reasoning_tokens` — the last is itself omitted for non-reasoning models); the whole object is absent when the provider reported no usage. `cost` is the USD price of that one model response. For a **native model call** it is the price computed from `usage` and the model's pricing table, and equals the cost the session records for the turn: it is **absent** when the response is unpriced (no pricing data on file, or no usage) and an explicit `0` for a priced call that was free — so a present `cost` is authoritative and an absent one means "unpriced", with no need to cross-check `usage`. (For harness agents the meaning differs — see the next note.) A cost ledger can therefore record per-call spend from the payload alone, without subscribing to the runtime event channel.
- For [harness agents]({{ '/features/harnesses/' | relative_url }}), `cost` is the harness's own reported total for the call rather than a computed price, and is present only when the harness reported a non-zero cost (some harnesses, e.g. `codex`, report token counts but no cost — those turns carry `usage` with `cost` absent, even though the recorded message stores `0`).
- `after_llm_call` fires for **every** model call, including calls made inside sub-sessions (transferred tasks, background agents, skills). For those, `session_id` is the sub-session's id. Summing `cost` across `after_llm_call` events therefore captures **all** spend, including sub-sessions (and even sub-sessions that error before their cost is persisted). Do **not** add a separately-queried session cost total on top: the runtime's own total already recurses into and includes completed sub-session spend, so combining the two double-counts. Pick one source — the summed hook costs — as the authoritative ledger.
- `context_limit` is `0` when the model definition is unavailable (treat `0` as "unknown", not as a real limit).
- `approval_decision` is one of `allow`, `deny`, `canceled`. `approval_source` is a stable classifier of which step decided (e.g. `yolo`, `session_permissions_allow`, `session_permissions_deny`, `team_permissions_allow`, `team_permissions_deny`, `pre_tool_use_hook_allow`, `pre_tool_use_hook_deny`, `readonly_hint`, `user_approved`, `user_approved_session`, `user_approved_tool`, `user_rejected`, `context_canceled`).

Expand Down Expand Up @@ -552,7 +555,7 @@ The `reason` field classifies the exit:

`before_llm_call` fires immediately before every model call (after `turn_start` has assembled the messages). It cannot contribute context — use `turn_start` for that — but it can **stop the run** by returning `decision: block` (or exit code 2). The built-in `max_iterations` hook implements a hard cap on top of this event.

`after_llm_call` fires immediately after each successful model call, before the response is recorded into the session and tool calls are dispatched. The assistant text is in `stop_response`. Use it for response auditing, redaction logging, or quality metrics. Failed model calls fire `on_error` instead.
`after_llm_call` fires immediately after each successful model call, before the response is recorded into the session and tool calls are dispatched. The assistant text is in `stop_response`, and the call's `usage` and `cost` carry the per-turn token usage and computed USD spend (see the field notes above). Use it for response auditing, redaction logging, quality metrics, or a sidecar cost ledger that records per-call spend without subscribing to the runtime event channel. Failed model calls fire `on_error` instead.

### Before/After-Compaction: structured compaction control

Expand Down
15 changes: 15 additions & 0 deletions examples/hooks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@
# /tmp/agent-session.log (session_start, session_end)
# /tmp/agent-prompts.log (user_prompt_submit)
# /tmp/agent-llm-calls.log (before_llm_call, after_llm_call)
# /tmp/agent-cost-ledger.csv (after_llm_call: per-call token usage + cost)
# /tmp/agent-turns.log (turn_end)
# /tmp/agent-tool-results.log (post_tool_use)
# /tmp/agent-permissions.log (permission_request)
Expand Down Expand Up @@ -277,6 +278,14 @@ agents:
# assistant text content arrives via stop_response (matching the
# stop event's payload). Failed calls fire on_error instead and
# skip this event.
#
# The payload also carries this call's token usage in .usage and its
# computed USD cost in .cost. .cost is ABSENT for an unpriced model
# (test with `has("cost")`) and an explicit 0 for a priced free call,
# so a present cost is authoritative without checking usage. That is
# everything a sidecar cost ledger needs — no event-channel wiring.
# after_llm_call also fires for sub-session turns (each with its own
# session_id), so summing .cost is the full spend for the run.
# ====================================================================
after_llm_call:
- type: command
Expand All @@ -286,6 +295,12 @@ agents:
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "unknown"')
LEN=$(echo "$INPUT" | jq -r '.stop_response // ""' | wc -c | tr -d ' ')
echo "[$(date)] [←] $SESSION_ID llm call complete, content=$LEN chars" >> /tmp/agent-llm-calls.log
# Per-call cost ledger: timestamp, session, model, tokens, cost.
echo "$INPUT" | jq -r '[
(now | todateiso8601), .session_id, .model_id,
(.usage.input_tokens // 0), (.usage.output_tokens // 0),
(if has("cost") then (.cost | tostring) else "unpriced" end)
] | @csv' >> /tmp/agent-cost-ledger.csv

# ====================================================================
# SESSION-END - cleanup when the session terminates.
Expand Down
36 changes: 36 additions & 0 deletions pkg/hooks/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,12 @@ const (
EventBeforeLLMCall EventType = "before_llm_call"
// EventAfterLLMCall fires immediately after a successful model call,
// before the response is recorded. Failed calls fire EventOnError.
// The Input carries the response text in [Input.StopResponse]
// (matching the stop event), the model that produced it in
// [Input.ModelID], and per-turn billing data in [Input.Usage] and
// [Input.Cost] so sidecar cost ledgers can record per-call spend
// from the payload alone, without subscribing to the runtime event
// channel.
EventAfterLLMCall EventType = "after_llm_call"
// EventSessionEnd fires when a session terminates.
EventSessionEnd EventType = "session_end"
Expand Down Expand Up @@ -293,6 +299,36 @@ type Input struct {
ApprovalDecision string `json:"approval_decision,omitempty"`
ApprovalSource string `json:"approval_source,omitempty"`

// AfterLLMCall specific: per-turn token usage and the computed USD
// cost of the model response the runtime just received. Both are
// populated only for [EventAfterLLMCall] and are nil for every
// other event. They are the hook-side counterpart of the runtime's
// internal TokenUsageEvent and let sidecar cost ledgers record
// per-call spend from the payload alone.
//
// Usage is a pointer so a handler can distinguish "the provider
// reported no usage" (nil) from "usage was zero".
//
// Cost is a *float64 with three meaningful states, mirroring the
// runtime's own pricing gate (usage present AND a model definition
// with a pricing table):
// - nil → unpriced: the model has no pricing data on file
// (unknown model ID, custom endpoint without cost
// config) or the provider reported no usage. With
// omitempty the "cost" key is absent on the wire.
// - 0 → a priced model whose computed cost is genuinely zero
// (a free call). Emitted as "cost": 0, NOT elided —
// omitempty on a pointer drops only nil, never a
// non-nil pointer to the zero value.
// - non-0 → the priced USD cost of this single response.
// A handler therefore reads a present "cost" as authoritative and
// an absent one as "unpriced", with no need to cross-check usage.
// (This is deliberately a *float64, unlike [chat.Message.Cost],
// which is a plain float64 with omitempty and so cannot distinguish
// a free priced call from an unpriced one on the wire.)
Usage *chat.Usage `json:"usage,omitempty"`
Cost *float64 `json:"cost,omitempty"`

// Compaction fields (BeforeCompaction, AfterCompaction).
InputTokens int64 `json:"input_tokens,omitempty"`
OutputTokens int64 `json:"output_tokens,omitempty"`
Expand Down
Loading