Skip to content

feat(git): content-addressed immutable snapshots#325

Open
worstell wants to merge 1 commit into
mainfrom
worstell/immutable-snapshots
Open

feat(git): content-addressed immutable snapshots#325
worstell wants to merge 1 commit into
mainfrom
worstell/immutable-snapshots

Conversation

@worstell

@worstell worstell commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Snapshots were written to one mutable key, so range requests could stitch bytes from two different snapshots — the root cause the pinning prototypes worked around.

Now each snapshot is written once under a commit-addressed key with a tiny mutable pointer to the current commit. The stable /snapshot.tar.zst URL resolves the pointer and advertises the immutable URL; a new /git/<repo>/snapshot/<commit>.tar.zst route serves the blob with standard HTTP Range (sendfile on the disk tier), enabling safe parallel downloads with no pin protocol. In-flight downloads survive regenerations until the old blob ages out via TTL.

Cold/S3-only serves fall back to a full 200, so clients must check 206-vs-200. Unknown/aged-out commits return 404 to trigger re-resolution via the stable URL.

No infra changes: reuses existing caches and TTLs; multiple versions per repo coexist until they age out.

@worstell worstell marked this pull request as ready for review June 12, 2026 18:32
@worstell worstell requested a review from a team as a code owner June 12, 2026 18:32
@worstell worstell requested review from inez and removed request for a team June 12, 2026 18:32

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 693104d727

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/strategy/git/snapshot.go Outdated
@worstell worstell marked this pull request as draft June 12, 2026 18:58
Comment thread internal/strategy/git/snapshot.go Outdated
}
// The blob is immutable, so it is safe to cache aggressively downstream.
w.Header().Set("Cache-Control", "public, max-age=31536000, immutable")
w.Header().Set("Accept-Ranges", "bytes")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the cold path where the snapshot is not pulled to the local disk cache from s3 yet, this should not advertise range request capability. maybe thats splitting hairs and wont actually matter, i just remember that edge case of the cold start from when i prototyped a much less thorough version of multipart range snapshot restore

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, updated

@worstell worstell marked this pull request as ready for review June 12, 2026 19:47

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 693104d727

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/strategy/git/snapshot.go Outdated
@worstell worstell force-pushed the worstell/immutable-snapshots branch from 693104d to 1c9372b Compare June 12, 2026 19:55

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c9372bf0a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/strategy/git/git.go Outdated
@worstell worstell force-pushed the worstell/immutable-snapshots branch from 1c9372b to d411afe Compare June 12, 2026 20:14

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d411afebcb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/strategy/git/snapshot.go
Comment thread internal/strategy/git/snapshot.go Outdated
Snapshots were written to a single mutable key (upstream+".snapshot"),
which made range requests unsafe: a regeneration could swap the bytes
mid-download, so parallel/ranged fetches needed bespoke pinning.

Write each snapshot once under a commit-addressed key and keep a tiny
mutable pointer to the current commit. The stable snapshot URL resolves
the pointer and advertises the immutable URL via Content-Location; a new
/git/<repo>/snapshot/<commit>.tar.zst route serves the blob with standard
HTTP Range support (zero-copy sendfile on the disk tier). In-flight
downloads survive regenerations until the old blob ages out via TTL.

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019eae3d-a2fd-70ca-80fe-a7536ec6748c
@worstell worstell force-pushed the worstell/immutable-snapshots branch from d411afe to 216a22a Compare June 12, 2026 20:24

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 216a22a1c5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

extraHeaders.Set("X-Cachew-Snapshot-Commit", headSHA)

return snapshot.Create(ctx, s.cache, cacheKey, workDir, 0, nil, s.config.ZstdThreads, extraHeaders)
return snapshot.Create(ctx, s.cache, commitKey, workDir, 0, nil, s.config.ZstdThreads, extraHeaders)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid racing immutable snapshot writes across replicas

In a multi-pod deployment, two replicas generating the same new HEAD can both observe Stat as missing and then both call snapshot.Create for the same supposedly immutable key. The cache Create implementations replace the object on close, and these archives are not byte-stable across independent clones because of .git metadata/mtimes, so once the first replica has published the pointer and clients start parallel range requests, the second replica finishing later can swap the bytes under the same URL and reintroduce the mixed-range corruption this content-addressed route is meant to prevent. This needs cache-wide create-if-absent/CAS semantics or deterministic archive bytes before advertising the key as immutable.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants