feat(git): content-addressed immutable snapshots#325
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 693104d727
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| // The blob is immutable, so it is safe to cache aggressively downstream. | ||
| w.Header().Set("Cache-Control", "public, max-age=31536000, immutable") | ||
| w.Header().Set("Accept-Ranges", "bytes") |
There was a problem hiding this comment.
in the cold path where the snapshot is not pulled to the local disk cache from s3 yet, this should not advertise range request capability. maybe thats splitting hairs and wont actually matter, i just remember that edge case of the cold start from when i prototyped a much less thorough version of multipart range snapshot restore
There was a problem hiding this comment.
good catch, updated
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 693104d727
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
693104d to
1c9372b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c9372bf0a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
1c9372b to
d411afe
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d411afebcb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Snapshots were written to a single mutable key (upstream+".snapshot"), which made range requests unsafe: a regeneration could swap the bytes mid-download, so parallel/ranged fetches needed bespoke pinning. Write each snapshot once under a commit-addressed key and keep a tiny mutable pointer to the current commit. The stable snapshot URL resolves the pointer and advertises the immutable URL via Content-Location; a new /git/<repo>/snapshot/<commit>.tar.zst route serves the blob with standard HTTP Range support (zero-copy sendfile on the disk tier). In-flight downloads survive regenerations until the old blob ages out via TTL. Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019eae3d-a2fd-70ca-80fe-a7536ec6748c
d411afe to
216a22a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 216a22a1c5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| extraHeaders.Set("X-Cachew-Snapshot-Commit", headSHA) | ||
|
|
||
| return snapshot.Create(ctx, s.cache, cacheKey, workDir, 0, nil, s.config.ZstdThreads, extraHeaders) | ||
| return snapshot.Create(ctx, s.cache, commitKey, workDir, 0, nil, s.config.ZstdThreads, extraHeaders) |
There was a problem hiding this comment.
Avoid racing immutable snapshot writes across replicas
In a multi-pod deployment, two replicas generating the same new HEAD can both observe Stat as missing and then both call snapshot.Create for the same supposedly immutable key. The cache Create implementations replace the object on close, and these archives are not byte-stable across independent clones because of .git metadata/mtimes, so once the first replica has published the pointer and clients start parallel range requests, the second replica finishing later can swap the bytes under the same URL and reintroduce the mixed-range corruption this content-addressed route is meant to prevent. This needs cache-wide create-if-absent/CAS semantics or deterministic archive bytes before advertising the key as immutable.
Useful? React with 👍 / 👎.
Snapshots were written to one mutable key, so range requests could stitch bytes from two different snapshots — the root cause the pinning prototypes worked around.
Now each snapshot is written once under a commit-addressed key with a tiny mutable pointer to the current commit. The stable
/snapshot.tar.zstURL resolves the pointer and advertises the immutable URL; a new/git/<repo>/snapshot/<commit>.tar.zstroute serves the blob with standard HTTPRange(sendfile on the disk tier), enabling safe parallel downloads with no pin protocol. In-flight downloads survive regenerations until the old blob ages out via TTL.Cold/S3-only serves fall back to a full
200, so clients must check 206-vs-200. Unknown/aged-out commits return 404 to trigger re-resolution via the stable URL.No infra changes: reuses existing caches and TTLs; multiple versions per repo coexist until they age out.