feat(git): version-pinned range serving for parallel snapshot downloads#324
Draft
worstell wants to merge 1 commit into
Draft
feat(git): version-pinned range serving for parallel snapshot downloads#324worstell wants to merge 1 commit into
worstell wants to merge 1 commit into
Conversation
fb00315 to
c629836
Compare
…nloads Adds a probe/range protocol so a client can download a large snapshot artifact via concurrent ranged GETs that all resolve to the same immutable S3 revision. A naive client-side parallel download is unsafe because the snapshot is mutable and multiple replicas can serve divergent copies, so ranges could stitch across revisions and corrupt the artifact. The client first probes (X-Cachew-Snapshot-Pin: probe) to obtain an opaque pin token (the S3 ETag) plus total size, then issues parallel Range GETs carrying that token. Each range is served straight from the pinned S3 revision via an If-Match GET, bypassing the per-pod disk tier, so chunks stitch correctly regardless of which replica handles each request. A regenerated snapshot fails closed with 412 so the client re-probes and restarts rather than mixing revisions. The pin token is prefixed (etag:) so it can later carry an S3 VersionId without a wire-format change. Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019eae3d-a2fd-70ca-80fe-a7536ec6748c
c629836 to
f2699f4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lets a client download a large git snapshot via concurrent ranged GETs that all resolve to the same immutable S3 revision.
A naive client-side parallel download is unsafe: snapshots are mutable and replicas can serve divergent copies, so ranges could stitch across revisions and corrupt the artifact. Here the client first probes (
X-Cachew-Snapshot-Pin: probe) for an opaque pin token (the S3 ETag) + total size, then issues parallelRangeGETs carrying that token. Each range is served from the pinned revision via anIf-MatchGET, bypassing the per-pod disk tier, so chunks stitch correctly regardless of which replica serves them. A regenerated snapshot fails closed with 412, prompting the client to re-probe and restart rather than mix revisions.The pin token is prefixed (
etag:) so it can later carry an S3 VersionId without a wire-format change.Draft: server-side prototype to benchmark the throughput win before wiring the client. Fail-closed ETag pinning is sufficient for benchmarking; whether fail-over (VersionId or content-addressed objects) is worth building depends on the observed 412/restart rate.