Skip to content

Prefer RateLimitError on 429 with rate-limit body#775

Open
juanmanuelramallo wants to merge 2 commits into
crmne:mainfrom
juanmanuelramallo:fix-anthropic-rate-limit-misclassification
Open

Prefer RateLimitError on 429 with rate-limit body#775
juanmanuelramallo wants to merge 2 commits into
crmne:mainfrom
juanmanuelramallo:fix-anthropic-rate-limit-misclassification

Conversation

@juanmanuelramallo
Copy link
Copy Markdown

What this does

Some providers include token-budget context in their HTTP 429 rate-limit bodies. Anthropic, for example, returns messages like "...rate limit of 5,000,000 input tokens per minute...". The "input tokens" substring matches the existing /input[_\s-]?token/i entry in CONTEXT_LENGTH_PATTERNS, so the current 429 branch classifies these as ContextLengthExceededError — making it impossible for callers to tell a token-quota rate-limit (retriable with backoff) apart from a true context-window overflow (non-retriable).

On 429, prefer RateLimitError whenever the body carries an explicit rate-limit signal ("rate limit"), and only fall back to ContextLengthExceededError when the message is unambiguously context-length-shaped. A single regex is sufficient — the Anthropic body and other documented rate-limit responses use the literal phrase "rate limit".

Preserves the existing "429 + Request too large for model" → ContextLengthExceededError behavior (that phrase has no rate-limit signal). All other status-code branches and the 400 branch are unchanged.

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Required for new features

  • I opened an issue before writing code and received maintainer approval
  • Linked issue: #___

PRs for new features or enhancements without a prior approved issue will be closed.

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
    • For provider changes: Re-recorded VCR cassettes with bundle exec rake vcr:record[provider_name]
    • All tests pass: bundle exec rspec
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

AI-generated code

  • I used AI tools to help write this code
  • I have reviewed and understand all generated code (required if above is checked)

API changes

  • Breaking change
  • New public methods/classes
  • Changed method signatures
  • No API changes

Some providers include token-budget context in their HTTP 429
rate-limit bodies. Anthropic, for example, returns messages like
"...rate limit of 5,000,000 input tokens per minute...". The
"input tokens" substring matches the existing /input[_\\s-]?token/i
entry in CONTEXT_LENGTH_PATTERNS, so the current 429 branch classifies
these as ContextLengthExceededError — making it impossible for callers
to tell a token-quota rate-limit (retriable with backoff) apart from a
true context-window overflow (non-retriable).

On 429, prefer RateLimitError whenever the body carries an explicit
rate-limit signal ("rate limit"), and only fall back to
ContextLengthExceededError when the message is unambiguously
context-length-shaped. A single regex is sufficient — the Anthropic
body and other documented rate-limit responses use the literal phrase
"rate limit".

Preserves the existing "429 + Request too large for model" →
ContextLengthExceededError behavior (that phrase has no rate-limit
signal). All other status-code branches and the 400 branch are
unchanged.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.22%. Comparing base (5bdda1a) to head (aa2a77f).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #775   +/-   ##
=======================================
  Coverage   87.21%   87.22%           
=======================================
  Files         121      121           
  Lines        5703     5707    +4     
  Branches     1442     1443    +1     
=======================================
+ Hits         4974     4978    +4     
  Misses        729      729           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@juanmanuelramallo
Copy link
Copy Markdown
Author

@crmne does it make sense to merge this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant