feat(runtime): expose per-turn usage and cost in the after_llm_call hook payload#2994
Draft
kimizuka wants to merge 4 commits into
Draft
feat(runtime): expose per-turn usage and cost in the after_llm_call hook payload#2994kimizuka wants to merge 4 commits into
kimizuka wants to merge 4 commits into
Conversation
Forward the per-call token usage and computed USD cost to the after_llm_call hook payload so sidecar cost ledgers can record per-call spend from the payload alone, without subscribing to the runtime event channel. Cost is a *float64 so the wire contract can distinguish an unpriced model (nil, key absent) from a priced free call (pointer to 0). The per-turn cost is computed once in computeMessageCost and threaded into both the hook payload and the recorded assistant message, so the two can never disagree. For harness agents the cost is surfaced only when the harness reported a non-zero value, avoiding reporting a billed turn as free when a harness omits its cost (e.g. codex). Signed-off-by: kimizuka <f.kimizuka@gmail.com>
Verify that after_llm_call populates usage and cost, that cost is nil when the model is unpriced, the nil-vs-zero JSON contract, harness usage with no cost surfacing as unpriced, and computeMessageCost. Signed-off-by: kimizuka <f.kimizuka@gmail.com>
Describe the new usage and cost fields, the priced/unpriced/free semantics and the harness caveat, and add a per-call cost-ledger example to examples/hooks.yaml. Signed-off-by: kimizuka <f.kimizuka@gmail.com>
0d587c8 to
8a4eb0b
Compare
Add the empty line embeddedstructfieldcheck wants between the embedded ModelStore and the cost field, and switch the float equality assertions to assert.InDelta to satisfy testifylint's float-compare rule.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds per-turn token
usageand computed USDcostto theafter_llm_callhook payload (hooks.Input), so a sidecar cost ledger can record per-call spend from the hook payload alone — without subscribing to the runtime event channel. This is the primitive-first half of #2948.model_idwas already populated by #2911, so the remaining scope here is justusageandcost.This implements what was discussed in #2948. The one design decision worth a second look is the
costJSON encoding, covered under Wire contract below.Closes #2948
Wire contract
hooks.Inputis the struct shared by every hook event and serialized to JSON for handlers, so the additions are deliberately conservative:For a native model call,
costhas three meaningful states:nil&0"cost": 0&N"cost": Nomitemptyon a pointer drops onlynil, never a pointer to0, so a free call still emits an explicit"cost": 0. A presentcostis therefore authoritative and an absent one means "unpriced", with no need to cross-checkusage. Both fields are populated only forafter_llm_call; they arenil(and thus absent) on every other event, so no other event's payload changes.Note on the schema (deviation from my #2948 comment)
In #2948 I suggested
json:"cost"(noomitempty, explicitcost: nullfor unpriced). I switched toomitemptyhere becausehooks.Inputis shared by all events: withoutomitempty, every non-after_llm_callevent (before_llm_call,session_end, …) would start emitting"cost": null, which both pollutes unrelated payloads and breaks the struct's all-omitemptyconvention. The pointer +omitemptyform keeps the same three-way distinction withinafter_llm_call(absent /0/N) while leaving other events untouched. An explicitnullinstead ofomitemptyis a one-line change if the team prefers it.Cost is computed once and equals the session's recorded cost
The per-turn cost is computed once in
runTurnvia a newcomputeMessageCost(usage, m) *float64helper and threaded into both the hook payload andrecordAssistantMessage. The previous inline arithmetic inrecordAssistantMessageis replaced by this single source (them *modelsdev.Modelparam becomes the precomputedcost *float64), so the cost a handler sees is exactly the cost the session bills for the turn. The persisted message cost is unchanged (nilrecords as0, matching prior behavior).Harness agents
For harness agents,
costis the harness's own reported total rather than a computed price. The harness library defaults the total to0when the harness output omits a cost (e.g. thecodexharness reports token counts but no cost), which is indistinguishable from a genuinely free call — so to avoid telling a ledger that a billed turn was free,costis surfaced only when the harness reported a non-zero value (otherwise it isnil/unpriced).Sub-sessions
after_llm_callfires for every model call, including those inside sub-sessions (transferred tasks, background agents, skills), each with the sub-session's ownsession_id. Summingcostacrossafter_llm_callevents therefore captures all spend — including sub-sessions whose cost may never reach the session store — which is the motivating case in #2948. The change does not touch sub-session persistence in any way.Example: a cost-ledger sidecar
A
commandhook can append one CSV row per model call straight from the payload — no event-channel subscription needed.has("cost")distinguishes an unpriced call (key absent) from a priced free one (cost: 0):A runnable version is wired into the canonical
examples/hooks.yaml. Becauseafter_llm_callfires for sub-session turns too (each with its ownsession_id), summing thecostcolumn is the full spend for the run.Out of scope (follow-ups)
cost/model_idreflect the primary model. This is pre-existing behavior inrecordAssistantMessageand is preserved here (hook cost == recorded cost); attributing cost to the model that actually ran is a separate change.Testing
pkg/runtime/after_llm_call_test.go:usage+ non-nilcost, and*cost == sess.OwnCost()usagepresent,costnil0/N)usagepresent,costnilcomputeMessageCostunit tests (every nil branch + all token classes)docs/configuration/hooks/index.mdupdated for the new fields and the sub-session / harness caveats;examples/hooks.yamldemonstrates a cost-ledger sidecar consuming the payload.Signed off per DCO.