Skip to content

feat(git): version-pinned range serving for parallel snapshot downloads#324

Draft
worstell wants to merge 1 commit into
mainfrom
worstell/pinned-range-serving
Draft

feat(git): version-pinned range serving for parallel snapshot downloads#324
worstell wants to merge 1 commit into
mainfrom
worstell/pinned-range-serving

Conversation

@worstell

@worstell worstell commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Lets a client download a large git snapshot via concurrent ranged GETs that all resolve to the same immutable S3 revision.

A naive client-side parallel download is unsafe: snapshots are mutable and replicas can serve divergent copies, so ranges could stitch across revisions and corrupt the artifact. Here the client first probes (X-Cachew-Snapshot-Pin: probe) for an opaque pin token (the S3 ETag) + total size, then issues parallel Range GETs carrying that token. Each range is served from the pinned revision via an If-Match GET, bypassing the per-pod disk tier, so chunks stitch correctly regardless of which replica serves them. A regenerated snapshot fails closed with 412, prompting the client to re-probe and restart rather than mix revisions.

The pin token is prefixed (etag:) so it can later carry an S3 VersionId without a wire-format change.

Draft: server-side prototype to benchmark the throughput win before wiring the client. Fail-closed ETag pinning is sufficient for benchmarking; whether fail-over (VersionId or content-addressed objects) is worth building depends on the observed 412/restart rate.

@worstell worstell force-pushed the worstell/pinned-range-serving branch 3 times, most recently from fb00315 to c629836 Compare June 12, 2026 17:34
…nloads

Adds a probe/range protocol so a client can download a large snapshot
artifact via concurrent ranged GETs that all resolve to the same immutable
S3 revision. A naive client-side parallel download is unsafe because the
snapshot is mutable and multiple replicas can serve divergent copies, so
ranges could stitch across revisions and corrupt the artifact.

The client first probes (X-Cachew-Snapshot-Pin: probe) to obtain an opaque
pin token (the S3 ETag) plus total size, then issues parallel Range GETs
carrying that token. Each range is served straight from the pinned S3
revision via an If-Match GET, bypassing the per-pod disk tier, so chunks
stitch correctly regardless of which replica handles each request. A
regenerated snapshot fails closed with 412 so the client re-probes and
restarts rather than mixing revisions.

The pin token is prefixed (etag:) so it can later carry an S3 VersionId
without a wire-format change.

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019eae3d-a2fd-70ca-80fe-a7536ec6748c
@worstell worstell force-pushed the worstell/pinned-range-serving branch from c629836 to f2699f4 Compare June 12, 2026 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant