fix: do not charge prompt tokens when stream aborts with no output#4199
fix: do not charge prompt tokens when stream aborts with no output#4199sjhddh wants to merge 2 commits intoQuantumNous:mainfrom
Conversation
When a streaming request fails abnormally (client disconnect or upstream timeout) before producing any completion tokens, calculateTextQuotaSummary was synthesizing a usage struct with estimated prompt tokens. This caused summary.TotalTokens to be non-zero, bypassing the zero-charge guard at the bottom of the function, and incorrectly billing users for requests where they received no output. Fix: when usage is nil AND the stream ended abnormally (IsStream && !StreamStatus.IsNormalEnd()), substitute an empty usage struct instead so TotalTokens remains 0 and Quota is forced to 0. Non-stream requests and normally-completed streams retain the existing estimated-prompt-token fallback behavior. Fixes QuantumNous#4168
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@service/text_quota.go`:
- Around line 99-112: The refund condition currently treats any abnormal stream
with missing usage as zero-charge; update the guard in the block that sets usage
(around relayInfo.IsStream and relayInfo.StreamStatus.IsNormalEnd()) to also
require that no output was sent by checking relayInfo.SendResponseCount == 0
before zeroing usage. Concretely, change the if condition that sets usage =
&dto.Usage{} to require relayInfo.IsStream &&
!relayInfo.StreamStatus.IsNormalEnd() && relayInfo.SendResponseCount == 0 so
only streams that ended abnormally without sending any chunks get refunded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8bcebe3f-929a-46ed-ac6a-fb9f013d0193
📒 Files selected for processing (1)
service/text_quota.go
Add SendResponseCount == 0 guard to the abnormal stream refund condition. Streams that sent partial output before failing will now correctly charge based on estimated tokens instead of getting a zero-charge refund. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Problem
Fixes #4168.
When a streaming request fails before producing any completion tokens (e.g. the client disconnects or the upstream times out),
calculateTextQuotaSummaryinservice/text_quota.gowas synthesizing a usage struct:Because
TotalTokensended up non-zero, the zero-charge guard at the bottom of the function was bypassed:Result: users were billed for the estimated prompt tokens on requests where they received zero output. Issue #4168 reports
95.7M quota ($191 USD) incorrectly charged to 99 users in a single production day.Root cause
The synthetic usage fallback was introduced in PR #3400 to handle upstreams that return HTTP 200 but omit usage data. It was applied unconditionally, even to streams that aborted abnormally.
Fix
When
usage == niland the stream ended abnormally (relayInfo.IsStream && !relayInfo.StreamStatus.IsNormalEnd()), substitute an all-zerodto.Usage{}instead of the estimated-prompt-token one. This letsTotalTokens = 0flow through to the existing zero-charge guard, settingQuota = 0.Non-streaming requests and normally-completed streams retain the previous estimated-prompt-token fallback behavior unchanged.
Testing
go build ./service/...passesgo test ./service/...passes (all existing tests green)StreamEndReasonClientGoneandStreamEndReasonTimeoutwithusage == nil,summary.TotalTokensis now 0,summary.Quotais 0, andSettleBillingis called with 0 (refunding any pre-deducted quota).Summary by CodeRabbit