perf: parallel range-GET S3 downloads for large objects by joshfriend · Pull Request #224 · block/cachew

joshfriend · 2026-03-24T20:44:46Z

Summary

Replace the single GetObject stream in S3.Open with parallel range-GET requests for objects larger than 32 MiB
8 workers download chunks concurrently and reassemble them in order via io.Pipe, multiplying S3 throughput for cold snapshot downloads
Observed 32 MB/s single-stream → 400+ MB/s with parallel connections on staging hardware

Split out from #218 (now closed) — the parallel tar extraction and in-process zstd changes were dropped after discovering they cause a 30x regression on git mirror snapshots with multi-GB packfiles (see #217).

Replace the single GetObject stream in S3.Open with parallel range-GET requests for objects larger than 32 MiB. 8 workers download chunks concurrently and reassemble them in order via io.Pipe, multiplying S3 throughput for cold snapshot downloads (observed 32 MB/s single-stream vs expected ~250+ MB/s with parallel connections).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: afbf08ad55

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T20:48:23Z

internal/cache/s3_parallel_get.go

+	results := make([]chan chunkResult, numChunks)
+	for i := range results {
+		results[i] = make(chan chunkResult, 1)


Cap buffered chunk results to prevent object-sized RAM use

This allocates a dedicated buffered channel for every chunk, so workers can complete far ahead of the writer and park each 32 MiB []byte in memory; with slow downstream reads (or one early chunk lagging), buffering grows toward the full object size instead of numWorkers × chunkSize, which can OOM on multi-GB objects. The current structure needs bounded in-flight chunk buffering tied to worker count rather than chunk count.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-24T20:48:23Z

internal/cache/s3_parallel_get.go

+		if writeErr != nil {
+			continue // drain remaining channels so goroutines can exit
+		}


Stop downloading remaining chunks after first stream failure

After writeErr is set, the loop intentionally keeps draining all chunk channels, and because the work queue is prefilled and never canceled, workers continue fetching the rest of the object even when the pipe/write path has already failed (for example, client disconnects). That turns a single failure into full-object S3 traffic and delayed cleanup; the code should cancel outstanding work as soon as a terminal write/read error occurs.

Useful? React with 👍 / 👎.

joshfriend requested a review from a team as a code owner March 24, 2026 20:44

joshfriend requested review from alecthomas and removed request for a team March 24, 2026 20:44

joshfriend closed this Mar 24, 2026

joshfriend deleted the jfriend/parallel-s3-range-get branch March 24, 2026 20:45

joshfriend restored the jfriend/parallel-s3-range-get branch March 24, 2026 20:46

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: parallel range-GET S3 downloads for large objects#224

perf: parallel range-GET S3 downloads for large objects#224
joshfriend wants to merge 1 commit intomainfrom
jfriend/parallel-s3-range-get

joshfriend commented Mar 24, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joshfriend commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joshfriend commented Mar 24, 2026 •

edited

Loading