feat: add aimd throttled object store by westonpace · Pull Request #6266 · lance-format/lance

westonpace · 2026-03-24T04:40:00Z

This does not hook the throttle up anywhere yet, that will come in a future PR.

Closes #6237
Closes #6238

github-actions · 2026-03-24T04:43:54Z

Code Review

Clean, well-documented PR overall. The AIMD algorithm and token bucket are well-structured. Below are the issues I'd flag before merging.

P0: `acquire_token` thundering herd under contention

In throttle.rs, the token acquisition loop sleeps then retries:

async fn acquire_token(&self) {
    loop {
        let sleep_duration = { /* lock, check, compute */ };
        tokio::time::sleep(sleep_duration).await;
    }
}

If N tasks are all waiting for 1 token, they wake simultaneously, re-acquire the lock, and only one succeeds. The rest compute a very short sleep and retry, degrading to a busy-wait spin loop with repeated lock acquisitions. Consider using tokio::sync::Notify or adding jitter to the sleep duration.

P1: `is_throttle_error` string matching is fragile

The function matches substrings like "429" and "503" in the error's display string. This can false-positive on object paths containing those substrings (e.g., data_429_backup.parquet) or error messages using those numbers in other contexts (e.g., "expected 503 bytes"). If the object_store crate exposes structured HTTP status codes, prefer matching on those. If string matching is unavoidable, use more specific patterns like "status: 429".

P1: Multipart uploads are only throttled on initiation

put_multipart and put_multipart_opts throttle the initial create_multipart call, but the returned MultipartUpload handle is unwrapped — subsequent put_part calls bypass the throttle entirely. For large files with many parts, writes are effectively unthrottled. The MultipartUpload should be wrapped to throttle each part upload.

P1: `unwrap()` on `std::sync::Mutex::lock()` will panic on poison

In aimd.rs, self.state.lock().unwrap() will panic if any thread previously panicked while holding the lock, cascading a single failure into a process crash. Use .lock().unwrap_or_else(|e| e.into_inner()) to recover from poisoning, or switch to parking_lot::Mutex which doesn't poison.

P1: No eager validation in `AimdThrottleWrapper::new()`

AimdThrottleWrapper::wrap() calls .expect("invalid AIMD config"), assuming validation was already done — but new() never validates. An invalid config (e.g., min_rate > max_rate) will panic lazily at wrap() time. Either validate eagerly in new() or propagate the error.

P1: `Debug` impl for `AimdController` acquires the mutex

AimdController::fmt() calls current_rate() which locks the mutex. If Debug is ever invoked while the lock is held (e.g., debug-logging inside a method that holds the lock), it will deadlock since std::sync::Mutex is not reentrant.

Minor: streaming `list`/`delete_stream` bypass throttling

These are passed through without throttling. list() can generate many paginated HTTP requests internally, each subject to cloud provider throttling. Worth at least a doc comment explaining the rationale for the pass-through.

codecov · 2026-03-24T05:12:42Z

Codecov Report

❌ Patch coverage is 79.19598% with 207 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-io/src/object_store/throttle.rs	68.00%	194 Missing and 6 partials ⚠️
rust/lance-core/src/utils/aimd.rs	98.10%	6 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

wkalt · 2026-03-24T14:34:20Z

+///
+/// The `object_store` crate surfaces these as `Error::Generic` with the HTTP
+/// status or cloud-specific message embedded in the source chain. We match
+/// against known patterns from S3, GCS, and Azure.


I wonder if we could upstream an improvement for this

Yeah, I've simplified this logic a bit, which should work as long as there are retries configured on the object store. However, I like the opendal approach, which is roughly the equivalent of adding is_temporary to object_store::Generic. I'll work on an upstream PR.

wkalt · 2026-03-24T14:38:51Z

+    if let object_store::Error::Generic { source, .. } = err {
+        let msg = source.to_string();
+        // Check for common throttle patterns from cloud stores
+        msg.contains("429")


the bot had a good comment about these. 429 or 503 could occur in the paths of index files or transaction files, so if those paths are also present in the errors then we could make an incorrect inference here.

I wonder if we could find the specific errors for each cloud provider documented. AWS ones seem to be here: https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html.

Getting to the HTTP response from object store is rather tricky and actually probably impossible due to the fact that most errors are private types hiding behind the Error trait so all you can do is grab the error as a string.

I've simplified the logic for now. We look for the word retries, max_retries which is still potentially in a URI but given the whitespace in there I think we're probably ok (I suspect any whitespace in the URI will be %20 or +) We can worry about a better way after we upstream.

Is this something we could contribute up to object_store?

westonpace · 2026-03-24T17:10:54Z

@claude review this PR

github-actions · 2026-03-24T17:11:12Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

wkalt

thanks!

This does not hook the throttle up anywhere yet, that will come in a future PR. Closes lance-format#6237 Closes lance-format#6238

westonpace added 2 commits March 23, 2026 14:53

Add aimd throttle

ca099ae

wip

9e0e30c

github-actions bot added the enhancement New feature or request label Mar 24, 2026

reduce impact of thundering herd

acd1f76

westonpace mentioned this pull request Mar 24, 2026

Investigate better handling of streaming writes / lists in AIMD throttle #6272

Open

Added impls for multipart and list

964e3ba

wkalt reviewed Mar 24, 2026

View reviewed changes

westonpace added 2 commits March 24, 2026 09:58

Simplify error handling

19a6d4c

Simplify error handling

d90f313

westonpace requested a review from wkalt March 24, 2026 17:11

wkalt approved these changes Mar 25, 2026

View reviewed changes

westonpace mentioned this pull request Mar 25, 2026

feat: move rate limiting to the object store #6293

Merged

westonpace merged commit e2231fd into lance-format:main Mar 25, 2026
31 checks passed

wjones127 pushed a commit to wjones127/lance that referenced this pull request Mar 29, 2026

feat: add aimd throttled object store (lance-format#6266)

b37cdb6

This does not hook the throttle up anywhere yet, that will come in a future PR. Closes lance-format#6237 Closes lance-format#6238

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add aimd throttled object store#6266

feat: add aimd throttled object store#6266
westonpace merged 6 commits intolance-format:mainfrom
westonpace:feat/aimd-throttle

westonpace commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

codecov bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

wkalt Mar 24, 2026

Uh oh!

westonpace Mar 24, 2026

Uh oh!

wkalt Mar 24, 2026

Uh oh!

westonpace Mar 24, 2026

Uh oh!

rpgreen Mar 24, 2026

Uh oh!

westonpace commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

wkalt left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

westonpace commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Code Review

P0: acquire_token thundering herd under contention

P1: is_throttle_error string matching is fragile

P1: Multipart uploads are only throttled on initiation

P1: unwrap() on std::sync::Mutex::lock() will panic on poison

P1: No eager validation in AimdThrottleWrapper::new()

P1: Debug impl for AimdController acquires the mutex

Minor: streaming list/delete_stream bypass throttling

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wkalt Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

westonpace Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

wkalt Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

westonpace Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

rpgreen Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

westonpace commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

wkalt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

P0: `acquire_token` thundering herd under contention

P1: `is_throttle_error` string matching is fragile

P1: `unwrap()` on `std::sync::Mutex::lock()` will panic on poison

P1: No eager validation in `AimdThrottleWrapper::new()`

P1: `Debug` impl for `AimdController` acquires the mutex

Minor: streaming `list`/`delete_stream` bypass throttling

codecov bot commented Mar 24, 2026 •

edited

Loading