fix: do not charge prompt tokens when stream aborts with no output by sjhddh · Pull Request #4199 · QuantumNous/new-api

sjhddh · 2026-04-12T08:17:55Z

Problem

When a streaming request fails before producing any completion tokens (e.g. the client disconnects or the upstream times out), calculateTextQuotaSummary in service/text_quota.go was synthesizing a usage struct:

// before this fix
if usage == nil {
    usage = &dto.Usage{
        PromptTokens:     relayInfo.GetEstimatePromptTokens(),
        CompletionTokens: 0,
        TotalTokens:      relayInfo.GetEstimatePromptTokens(),
    }
}

Because TotalTokens ended up non-zero, the zero-charge guard at the bottom of the function was bypassed:

if summary.TotalTokens == 0 {
    summary.Quota = 0   // never reached for failed streams
}

Result: users were billed for the estimated prompt tokens on requests where they received zero output. Issue #4168 reports ~~95.7M quota (~~$191 USD) incorrectly charged to 99 users in a single production day.

Root cause

The synthetic usage fallback was introduced in PR #3400 to handle upstreams that return HTTP 200 but omit usage data. It was applied unconditionally, even to streams that aborted abnormally.

Fix

When usage == nil and the stream ended abnormally (relayInfo.IsStream && !relayInfo.StreamStatus.IsNormalEnd()), substitute an all-zero dto.Usage{} instead of the estimated-prompt-token one. This lets TotalTokens = 0 flow through to the existing zero-charge guard, setting Quota = 0.

Non-streaming requests and normally-completed streams retain the previous estimated-prompt-token fallback behavior unchanged.

if usage == nil {
    if relayInfo.IsStream && !relayInfo.StreamStatus.IsNormalEnd() {
        usage = &dto.Usage{}   // no output → no charge
    } else {
        usage = &dto.Usage{
            PromptTokens:     relayInfo.GetEstimatePromptTokens(),
            ...
        }
    }
}

Testing

go build ./service/... passes
go test ./service/... passes (all existing tests green)
Manually traced the code path: for StreamEndReasonClientGone and StreamEndReasonTimeout with usage == nil, summary.TotalTokens is now 0, summary.Quota is 0, and SettleBilling is called with 0 (refunding any pre-deducted quota).

Summary by CodeRabbit

Bug Fixes
- Fixed billing calculation for abnormally terminated streaming requests. Estimated prompt-token charges are no longer applied when a stream fails to complete normally and no responses were sent.
- Ensures more accurate token reporting and prevents inadvertent billing for failed or incomplete processing, improving billing reliability and user trust.

When a streaming request fails abnormally (client disconnect or upstream timeout) before producing any completion tokens, calculateTextQuotaSummary was synthesizing a usage struct with estimated prompt tokens. This caused summary.TotalTokens to be non-zero, bypassing the zero-charge guard at the bottom of the function, and incorrectly billing users for requests where they received no output. Fix: when usage is nil AND the stream ended abnormally (IsStream && !StreamStatus.IsNormalEnd()), substitute an empty usage struct instead so TotalTokens remains 0 and Quota is forced to 0. Non-stream requests and normally-completed streams retain the existing estimated-prompt-token fallback behavior. Fixes QuantumNous#4168

coderabbitai · 2026-04-12T08:18:08Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 942f7668-b693-422c-9648-f9e6472093da

📥 Commits

Reviewing files that changed from the base of the PR and between 29b8789 and f208026.

📒 Files selected for processing (1)

service/text_quota.go

🚧 Files skipped from review as they are similar to previous changes (1)

service/text_quota.go

Walkthrough

The calculateTextQuotaSummary logic in service/text_quota.go now treats missing usage differently for streams: if usage is nil and the request was a stream that ended abnormally with no responses sent, it uses zero-valued usage instead of estimating prompt tokens.

Changes

Cohort / File(s)	Summary
Stream Abnormal Termination Handling `service/text_quota.go`	Adjust fallback when `usage == nil`: if `relayInfo.IsStream && !relayInfo.StreamStatus.IsNormalEnd() && relayInfo.SendResponseCount == 0` then set `usage` to zeroed `dto.Usage{}`; otherwise keep estimated prompt-token fallback previously used.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

fix: restore pre-3400 OpenRouter billing semantics #3438: Modifies calculateTextQuotaSummary billing/token fallbacks and provider-specific billing semantics in the same area of service/text_quota.go.
fix: prompt calculation #1606: Changes how prompt token usage is recorded/propagated via relayInfo, related to usage fallback logic.

Suggested reviewers

seefs001

Poem

🐰 The stream gave up before a single line,
No answers reached the user—none to mine.
I hop in code and set the count to nil,
No phantom charge, no unexpected bill.
Hooray—fair hops return to every line! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main fix: preventing prompt token charges when streaming requests abort with no output.
Linked Issues check	✅ Passed	The code changes directly address all requirements from issue `#4168`: preventing charges for aborted streams with zero completion tokens by setting usage to zero when stream ends abnormally with no responses sent.
Out of Scope Changes check	✅ Passed	All changes in service/text_quota.go are scoped to fixing the billing issue for abnormal stream termination with zero output; no unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@service/text_quota.go`:
- Around line 99-112: The refund condition currently treats any abnormal stream
with missing usage as zero-charge; update the guard in the block that sets usage
(around relayInfo.IsStream and relayInfo.StreamStatus.IsNormalEnd()) to also
require that no output was sent by checking relayInfo.SendResponseCount == 0
before zeroing usage. Concretely, change the if condition that sets usage =
&dto.Usage{} to require relayInfo.IsStream &&
!relayInfo.StreamStatus.IsNormalEnd() && relayInfo.SendResponseCount == 0 so
only streams that ended abnormally without sending any chunks get refunded.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8bcebe3f-929a-46ed-ac6a-fb9f013d0193

📥 Commits

Reviewing files that changed from the base of the PR and between ed7f839 and 29b8789.

📒 Files selected for processing (1)

service/text_quota.go

Add SendResponseCount == 0 guard to the abnormal stream refund condition. Streams that sent partial output before failing will now correctly charge based on estimated tokens instead of getting a zero-charge refund. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Comment thread service/text_quota.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: do not charge prompt tokens when stream aborts with no output#4199

fix: do not charge prompt tokens when stream aborts with no output#4199
sjhddh wants to merge 2 commits intoQuantumNous:mainfrom
sjhddh:fix/stream-abort-quota-billing

sjhddh commented Apr 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sjhddh commented Apr 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sjhddh commented Apr 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading