Skip to content

fix: do not charge prompt tokens when stream aborts with no output#4199

Open
sjhddh wants to merge 2 commits intoQuantumNous:mainfrom
sjhddh:fix/stream-abort-quota-billing
Open

fix: do not charge prompt tokens when stream aborts with no output#4199
sjhddh wants to merge 2 commits intoQuantumNous:mainfrom
sjhddh:fix/stream-abort-quota-billing

Conversation

@sjhddh
Copy link
Copy Markdown

@sjhddh sjhddh commented Apr 12, 2026

Problem

Fixes #4168.

When a streaming request fails before producing any completion tokens (e.g. the client disconnects or the upstream times out), calculateTextQuotaSummary in service/text_quota.go was synthesizing a usage struct:

// before this fix
if usage == nil {
    usage = &dto.Usage{
        PromptTokens:     relayInfo.GetEstimatePromptTokens(),
        CompletionTokens: 0,
        TotalTokens:      relayInfo.GetEstimatePromptTokens(),
    }
}

Because TotalTokens ended up non-zero, the zero-charge guard at the bottom of the function was bypassed:

if summary.TotalTokens == 0 {
    summary.Quota = 0   // never reached for failed streams
}

Result: users were billed for the estimated prompt tokens on requests where they received zero output. Issue #4168 reports 95.7M quota ($191 USD) incorrectly charged to 99 users in a single production day.

Root cause

The synthetic usage fallback was introduced in PR #3400 to handle upstreams that return HTTP 200 but omit usage data. It was applied unconditionally, even to streams that aborted abnormally.

Fix

When usage == nil and the stream ended abnormally (relayInfo.IsStream && !relayInfo.StreamStatus.IsNormalEnd()), substitute an all-zero dto.Usage{} instead of the estimated-prompt-token one. This lets TotalTokens = 0 flow through to the existing zero-charge guard, setting Quota = 0.

Non-streaming requests and normally-completed streams retain the previous estimated-prompt-token fallback behavior unchanged.

if usage == nil {
    if relayInfo.IsStream && !relayInfo.StreamStatus.IsNormalEnd() {
        usage = &dto.Usage{}   // no output → no charge
    } else {
        usage = &dto.Usage{
            PromptTokens:     relayInfo.GetEstimatePromptTokens(),
            ...
        }
    }
}

Testing

  • go build ./service/... passes
  • go test ./service/... passes (all existing tests green)
  • Manually traced the code path: for StreamEndReasonClientGone and StreamEndReasonTimeout with usage == nil, summary.TotalTokens is now 0, summary.Quota is 0, and SettleBilling is called with 0 (refunding any pre-deducted quota).

Summary by CodeRabbit

  • Bug Fixes
    • Fixed billing calculation for abnormally terminated streaming requests. Estimated prompt-token charges are no longer applied when a stream fails to complete normally and no responses were sent.
    • Ensures more accurate token reporting and prevents inadvertent billing for failed or incomplete processing, improving billing reliability and user trust.

When a streaming request fails abnormally (client disconnect or upstream
timeout) before producing any completion tokens, calculateTextQuotaSummary
was synthesizing a usage struct with estimated prompt tokens. This caused
summary.TotalTokens to be non-zero, bypassing the zero-charge guard at the
bottom of the function, and incorrectly billing users for requests where
they received no output.

Fix: when usage is nil AND the stream ended abnormally
(IsStream && !StreamStatus.IsNormalEnd()), substitute an empty usage
struct instead so TotalTokens remains 0 and Quota is forced to 0.
Non-stream requests and normally-completed streams retain the existing
estimated-prompt-token fallback behavior.

Fixes QuantumNous#4168
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 942f7668-b693-422c-9648-f9e6472093da

📥 Commits

Reviewing files that changed from the base of the PR and between 29b8789 and f208026.

📒 Files selected for processing (1)
  • service/text_quota.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • service/text_quota.go

Walkthrough

The calculateTextQuotaSummary logic in service/text_quota.go now treats missing usage differently for streams: if usage is nil and the request was a stream that ended abnormally with no responses sent, it uses zero-valued usage instead of estimating prompt tokens.

Changes

Cohort / File(s) Summary
Stream Abnormal Termination Handling
service/text_quota.go
Adjust fallback when usage == nil: if relayInfo.IsStream && !relayInfo.StreamStatus.IsNormalEnd() && relayInfo.SendResponseCount == 0 then set usage to zeroed dto.Usage{}; otherwise keep estimated prompt-token fallback previously used.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • seefs001

Poem

🐰 The stream gave up before a single line,
No answers reached the user—none to mine.
I hop in code and set the count to nil,
No phantom charge, no unexpected bill.
Hooray—fair hops return to every line! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main fix: preventing prompt token charges when streaming requests abort with no output.
Linked Issues check ✅ Passed The code changes directly address all requirements from issue #4168: preventing charges for aborted streams with zero completion tokens by setting usage to zero when stream ends abnormally with no responses sent.
Out of Scope Changes check ✅ Passed All changes in service/text_quota.go are scoped to fixing the billing issue for abnormal stream termination with zero output; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@service/text_quota.go`:
- Around line 99-112: The refund condition currently treats any abnormal stream
with missing usage as zero-charge; update the guard in the block that sets usage
(around relayInfo.IsStream and relayInfo.StreamStatus.IsNormalEnd()) to also
require that no output was sent by checking relayInfo.SendResponseCount == 0
before zeroing usage. Concretely, change the if condition that sets usage =
&dto.Usage{} to require relayInfo.IsStream &&
!relayInfo.StreamStatus.IsNormalEnd() && relayInfo.SendResponseCount == 0 so
only streams that ended abnormally without sending any chunks get refunded.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8bcebe3f-929a-46ed-ac6a-fb9f013d0193

📥 Commits

Reviewing files that changed from the base of the PR and between ed7f839 and 29b8789.

📒 Files selected for processing (1)
  • service/text_quota.go

Comment thread service/text_quota.go
Add SendResponseCount == 0 guard to the abnormal stream refund
condition. Streams that sent partial output before failing will now
correctly charge based on estimated tokens instead of getting a
zero-charge refund.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: stream 失败(client_gone/timeout)且 completion_tokens=0 时仍按 prompt_tokens 扣费

1 participant