Skip to content

Add rate limiting CRD types and token-bucket package#4577

Open
jerm-dro wants to merge 2 commits intomainfrom
jerm-dro/global-rate-limits
Open

Add rate limiting CRD types and token-bucket package#4577
jerm-dro wants to merge 2 commits intomainfrom
jerm-dro/global-rate-limits

Conversation

@jerm-dro
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro commented Apr 6, 2026

Summary

Rate limiting is needed to protect downstream MCP servers from traffic
spikes. RFC THV-0057 specifies a Redis-backed token bucket approach.

This PR lays the foundation for global rate limits on MCPServer:

  • Adds rateLimiting field to MCPServerSpec with global and
    per-tool bucket configs (maxTokens + refillPeriod)
  • Introduces pkg/ratelimit/ with a public Limiter interface and
    internal token bucket backed by an atomic Redis Lua script
  • Uses config.Duration for refillPeriod per RFC review feedback
  • NewLimiter constructor accepts CRD types directly — no intermediate
    public config types

Middleware wiring, JSON-RPC error response helpers, and per-user rate
limits will follow in stacked PRs (#4550).

Part of #4551

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

14 tests cover:

  • Token bucket: capacity exhaustion, refill after time, key auto-expiry, RetryAfter computation
  • Limiter: per-tool isolation (AC5), server + per-tool both required (AC6),
    no-op on nil config, Redis unavailability returns error, userID no-op
  • Atomic multi-bucket: tool rejection does not drain server budget
  • Constructor validation: zero maxTokens, zero duration

Changes

File Change
cmd/thv-operator/api/v1alpha1/mcpserver_types.go Add RateLimitConfig, RateLimitBucket, ToolRateLimitConfig types and RateLimiting field
pkg/ratelimit/limiter.go Public Limiter interface, Decision type, NewLimiter() constructor accepting CRD types directly
pkg/ratelimit/internal/bucket/bucket.go Token bucket with atomic multi-key Redis Lua script

Does this introduce a user-facing change?

New rateLimiting field on MCPServer CRD. Not yet functional — middleware
wiring will follow in a subsequent PR.

Special notes for reviewers

  • The Lua script uses redis.call('TIME') for clock consistency across replicas
    (and to enable miniredis.FastForward() in tests)
  • Multi-key Lua script checks all buckets atomically before consuming from any,
    preventing tool rejections from draining the server budget
  • CRD regeneration produced ~240 lines of generated YAML/deepcopy changes
  • pkg/ratelimit/internal/bucket/ is intentionally internal — only the Limiter
    interface and Decision type are public

Large PR Justification

This exceeds the 400-line guideline but is a single cohesive new package (pkg/ratelimit/) that cannot be meaningfully split further — the limiter and internal bucket form one logical unit.

Generated with Claude Code

@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 6, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 97.36842% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.92%. Comparing base (8c70343) to head (bfb7eee).

Files with missing lines Patch % Lines
pkg/ratelimit/limiter.go 95.55% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4577      +/-   ##
==========================================
+ Coverage   68.87%   68.92%   +0.05%     
==========================================
  Files         505      507       +2     
  Lines       52380    52456      +76     
==========================================
+ Hits        36076    36155      +79     
+ Misses      13516    13510       -6     
- Partials     2788     2791       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@github-actions github-actions bot dismissed their stale review April 6, 2026 21:12

Large PR justification has been provided. Thank you!

@jerm-dro jerm-dro requested a review from Copilot April 6, 2026 21:13
@jerm-dro
Copy link
Copy Markdown
Contributor Author

jerm-dro commented Apr 6, 2026

@claude review using the code-review skill in the repo.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 6, 2026

Claude finished @jerm-dro's task in 4m 13s —— View job


Code Review: Add rate limiting CRD types and token-bucket package

  • Read changed files
  • Review CRD types and operator changes
  • Review rate limit package (limiter, config, errors, bucket)
  • Review tests
  • Post final review

Summary

Well-structured foundation. The package layout, interface design, Redis hash tag for cluster slot affinity, and Lua TIME clock are all solid choices. Tests are thorough and well-organized. Three issues require changes before merge; three are suggestions.


Required changes

1. Token leak: server token consumed before tool-level check (pkg/ratelimit/limiter.go:79–99)

In Allow(), when serverBucket.Consume() returns true (consuming a token from Redis), a subsequent rejection by the tool-level bucket still returns Allowed: false — but the server-level token is permanently gone. Under sustained pressure against a rate-limited tool, every rejected request also drains the server quota, making the effective server limit tighter than configured.

The cleanest fix is a two-phase approach or an atomic multi-key Lua script. A minimal acceptable fix is a code comment explicitly documenting this as a known limitation with a reference to the tracking issue. Fix this →

2. HMSET removed in Redis 7.4+ (pkg/ratelimit/internal/bucket/bucket.go, Lua script — three occurrences)

-- all three HMSET calls should be HSET:
redis.call('HSET', key, 'tokens', maxTokens - 1, 'last_refill', now)

HMSET was deprecated in Redis 4.0 and removed in 7.4. The argument signature is identical for the two-field case used here, so it's a drop-in replacement. miniredis may silently accept HMSET, which is why tests don't catch this. Fix this →

3. Missing MaxTokens > 0 guard in parseBucket (pkg/ratelimit/config.go:72)

The CRD has +kubebuilder:validation:Minimum=1, but parseBucket has no equivalent Go-level check. If CRD admission is bypassed (e.g., direct API server access), MaxTokens=0 results in refillRate = 0.0 in New(), which makes RetryAfter() return +Inf and the Lua script initializes the bucket to -1 tokens. Add a guard matching the existing RefillPeriod check:

if b.MaxTokens <= 0 {
    return nil, fmt.Errorf("maxTokens must be positive, got %d", b.MaxTokens)
}

Fix this →


Suggestions

4. last_refill reset on every rejection (bucket.go Lua script, lines ~54–59)

When a request is rejected (refilled < 1), last_refill is reset to now. Under sustained load hitting an empty bucket, every rejection resets the clock — meaning no refill ever accumulates. The bucket can be permanently starved. Standard token bucket implementations update last_refill only to the time from which the refill computation started, not now. If this is intentional, it needs a comment explaining the reasoning.

5. Document RetryAfter() as a static estimate (bucket.go:101)

The current godoc says "minimum wait time" but the function returns 1/refillRate unconditionally — a static worst-case independent of current bucket state. Clients should not rely on this as an accurate ETA. Update the doc to say "static worst-case estimate based on refill rate; actual wait may be shorter."

6. Key name comment (config.go:49,59)

The "global" segment in thv:rl:{ns:name}:global:tool:search will be opaque to someone inspecting Redis in production — it looks like it conflicts with the server-level thv:rl:{ns:name}:global key. A brief comment at the key construction sites explaining the "global" vs future "perUser" namespace distinction would help.


Minor observations

  • RateLimitedResponse (errors.go:67) hardcodes nil for requestID — this is likely intentional in the reverse-proxy use case where the request ID isn't available at the ModifyResponse stage, but worth a comment.
  • for range 2 {} in tests is fine — the project is on Go 1.26.
  • Copyright year is 2025 on new files added in 2026 — consistent with project style (not a blocker).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the initial building blocks for Redis-backed token-bucket rate limiting for MCPServer, including new CRD fields, a pkg/ratelimit package, and JSON-RPC/HTTP 429 error helpers.

Changes:

  • Add rateLimiting configuration to the MCPServer CRD (global/perUser/per-tool buckets).
  • Implement a Redis Lua-script-backed token bucket and a Limiter interface to evaluate server + per-tool limits.
  • Add helpers for JSON-RPC rate-limit errors (-32029) and HTTP 429 responses with Retry-After.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
cmd/thv-operator/api/v1alpha1/mcpserver_types.go Adds RateLimit API types and spec.rateLimiting field
cmd/thv-operator/api/v1alpha1/mcpserver_types_test.go Adds JSON roundtrip tests for new rate limit types/fields
cmd/thv-operator/api/v1alpha1/zz_generated.deepcopy.go Deepcopy generation for new CRD types/field
deploy/charts/operator-crds/templates/toolhive.stacklok.dev_mcpservers.yaml CRD schema update for rateLimiting
deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpservers.yaml Packaged CRD schema update for rateLimiting
docs/operator/crd-api.md Generated CRD API docs include new rate limit types/field
pkg/ratelimit/config.go Parses CRD rate limit config into runtime config + key derivation
pkg/ratelimit/config_test.go Unit tests for config parsing/validation
pkg/ratelimit/limiter.go Implements Limiter that checks server + per-tool buckets
pkg/ratelimit/limiter_test.go Unit tests for limiter behavior and Redis failure mode
pkg/ratelimit/errors.go JSON-RPC error body + HTTP 429 helpers for rate limiting
pkg/ratelimit/errors_test.go Unit tests for error helpers
pkg/ratelimit/internal/bucket/bucket.go Redis-backed token bucket with atomic Lua script + key prefix helper
pkg/ratelimit/internal/bucket/bucket_test.go Unit tests for bucket consume/refill/expiry/retry-after

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jerm-dro jerm-dro force-pushed the jerm-dro/global-rate-limits branch from 8559673 to d1c76e7 Compare April 6, 2026 21:24
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 6, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 6, 2026
@jerm-dro jerm-dro force-pushed the jerm-dro/global-rate-limits branch from d1c76e7 to 0fdb684 Compare April 6, 2026 21:28
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 6, 2026
@jerm-dro jerm-dro force-pushed the jerm-dro/global-rate-limits branch from 0fdb684 to 9495ef1 Compare April 6, 2026 21:34
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 6, 2026
@github-actions github-actions bot dismissed their stale review April 7, 2026 00:19

Large PR justification has been provided. Thank you!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@jerm-dro jerm-dro force-pushed the jerm-dro/global-rate-limits branch from 9495ef1 to 3b8ae2e Compare April 7, 2026 00:30
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 7, 2026
@jerm-dro jerm-dro force-pushed the jerm-dro/global-rate-limits branch from 3b8ae2e to 6b4a933 Compare April 7, 2026 00:50
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 7, 2026
@jerm-dro jerm-dro force-pushed the jerm-dro/global-rate-limits branch from 6b4a933 to 5fbf2bf Compare April 7, 2026 01:07
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 7, 2026
jerm-dro and others added 2 commits April 6, 2026 18:14
Add RateLimitConfig, RateLimitBucket, and ToolRateLimitConfig CRD types
to support configurable token-bucket rate limiting on MCPServer. The
rateLimiting field accepts global and per-tool bucket configs with
maxTokens and refillPeriod (metav1.Duration) parameters.

CEL validation ensures:
- At least one of global or tools is configured
- Each tool entry has a global bucket set
- Redis session storage is required when rateLimiting is set

Part of #4551

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce pkg/ratelimit/ with a public Limiter interface and internal
token bucket implementation backed by a Redis Lua script.

Public API surface is minimal:
- Limiter interface with Allow(ctx, toolName, userID)
- Decision result type with Allowed and RetryAfter
- NewLimiter constructor accepting CRD types directly

Internal (pkg/ratelimit/internal/bucket/):
- Atomic multi-key Lua script that checks all buckets before consuming
  from any, preventing rejected per-tool calls from draining the
  server-level budget
- Uses Redis TIME for clock consistency and miniredis testability
- Key format and derivation fully encapsulated in the internal package

Tests cover all acceptance criteria for global rate limits:
- AC3: requests exceeding maxTokens are rejected
- AC4: Retry-After computed from the global bucket refill rate
- AC5: per-tool limit on one tool does not affect other tools
- AC6: request must pass both server-level and per-tool global limits
- Atomic multi-bucket: tool rejection does not drain server budget
- Redis unavailability returns error (fail-open is caller's job)
- No-op limiter when config is nil

Part of #4551

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jerm-dro jerm-dro force-pushed the jerm-dro/global-rate-limits branch from 5fbf2bf to bfb7eee Compare April 7, 2026 01:14
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 7, 2026
Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on the atomic multi-bucket Lua script and the clean internal/public package split. Three issues found — two medium, one low.

# Severity Issue
1 Medium ToolRateLimitConfig.Global marked +optional but CEL-validated as required
2 Medium No duplicate tool name protection on tools array
3 Low Negative elapsed time in Lua script not guarded against clock drift


// Global defines a server-wide token bucket for this specific tool.
// +optional
Global *RateLimitBucket `json:"global,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: +optional contradicts CEL-required validation

The +optional marker and omitempty JSON tag tell the API server, kubectl explain, and generated docs that Global is not required, but the CEL rule on line 516 (has(self.global)) immediately rejects objects without it. This creates a confusing UX.

If global is always required (which makes sense since a ToolRateLimitConfig without a bucket is meaningless), remove +optional and omitempty, and use +kubebuilder:validation:Required instead — this eliminates the CEL rule entirely:

Suggested change
Global *RateLimitBucket `json:"global,omitempty"`
Global *RateLimitBucket `json:"global"`

Alternatively, if keeping +optional for forward-compatibility with a future perUser field (#4550), add a comment explaining that reasoning.

// Each entry applies additional rate limits to calls targeting a specific tool name.
// A request must pass both the server-level limit and the per-tool limit.
// +optional
Tools []ToolRateLimitConfig `json:"tools,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: No duplicate tool name validation

If a user specifies two entries with the same name (e.g., name: "search" twice), the second silently overwrites the first in the limiter's map[string]*bucket.TokenBucket. Adding +listType=map and +listMapKey=name enables the API server to reject duplicates at admission and makes strategic merge patch work correctly on individual tool entries:

Suggested change
Tools []ToolRateLimitConfig `json:"tools,omitempty"`
// +listType=map
// +listMapKey=name
// +optional
Tools []ToolRateLimitConfig `json:"tools,omitempty"`

(This requires removing the existing // +optional line above.)

if tokens == nil then
refilled[i] = maxTokens
else
local elapsedSec = (now - lastRefill) / 1000000
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low: Negative elapsed time not guarded

If redis.call('TIME') returns a value less than lastRefill (NTP adjustment, leap second, or clock drift during failover), elapsedSec goes negative, which would subtract tokens from the bucket rather than refilling.

Add a guard:

Suggested change
local elapsedSec = (now - lastRefill) / 1000000
local elapsedSec = math.max(0, (now - lastRefill) / 1000000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants