LocalStore: Implement addMultipleToStore()#480
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds LocalStore::addMultipleToStore() for parallel import of multiple NARs, introduces doAddToStore() to centralize per-path restore/verification/canonicalise/optimise/fsync, refactors addToStore() to call the helper, and makes GC remove stale ChangesBulk add-to-store with parallel extraction
Sequence Diagram(s)sequenceDiagram
participant Client
participant LocalStore
participant ThreadPool
participant Source
participant Registrar
Client->>LocalStore: addMultipleToStore(PathsSource, Activity, RepairFlag, CheckSigsFlag)
LocalStore->>LocalStore: validate signatures (optional), plan work, lock targets
LocalStore->>ThreadPool: submit doAddToStore tasks (sorted by NAR size)
ThreadPool->>Source: extract/restore NAR for path
Source-->>ThreadPool: restored contents + NAR hash/size
ThreadPool->>LocalStore: doAddToStore completes (canonicalise/optimise/fsync)
ThreadPool-->>LocalStore: signal task complete
LocalStore->>Registrar: registerValidPaths(all new paths)
LocalStore->>LocalStore: delete `.unpacked` markers
LocalStore-->>Client: return
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Infer (1.2.0)src/libstore/local-store.ccsrc/libstore/local-store.cc:1:10: fatal error: 'nix/store/local-store.hh' file not found ... [truncated 2200 characters] ... n file "src/clang/cTrans.ml", line 4784, characters 10-1023 Comment |
The `cleanup` mentioned was removed in f0143f5.
This speeds up `nix copy` into a local store. Previously copyPaths() relied on the default addMultipleToStore(), which walks the closure in topological order calling addToStore() per path. This means that we can't start copying a NAR into the store until all its dependencies have been added, in order to preserve the closure variant. This may waste available network/CPU capacity. So now we extract the NARs into the store first before registering the store paths as valid. Specifically: 1. We acquire locks on all store paths in sorted order (to avoid deadlocks). 2. We extract the NARs in order of descending NAR size (to minimize the makespan). 3. We register all new paths as valid in a single transaction. Note: currently this doesn't benefit copying into a local store via the Nix daemon. (See RemoteStore::addMultipleToStore() which sends NARs in topological order.) It also doesn't yet benefit substitution, since it currently substitutes one path at a time. (It should just rely on copyPaths() to fetch entire missing closures.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The NAR-extraction body (delete, restore, verify NAR hash/size, verify CA, autoGC, canonicalise, optimise, fsync) was duplicated between addToStore() and addMultipleToStore(). Extract it into a private doAddToStore() helper that both call; it writes and verifies a path's contents but does not take a lock or register validity, which remains the caller's responsibility. No functional change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
bebbb70 to
7df8aab
Compare
Previously, uf addMultipleToStore() is interrupted, all NARs extracted so far were lost, since path validity is only registered in a single transaction at the very end. To avoid this, doAddToStore() now creates a file `<realPath>.unpacked` to mark that <realPath> contains a completely unpacked NAR. If that file exists, doAddToStore() can skip fetching. The markers are removed once the paths are registered as valid. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/libstore/local-store.cc`:
- Around line 1037-1045: After deletePath(realPath) ensure you invoke autoGC()
before unpacking/writing the NAR so GC can free space; specifically, call
autoGC() (the same mechanism used by addToStoreFromDump) before creating
HashSink/TeeSource and before calling restorePath(realPath, wrapperSource,
localSettings.fsyncStorePaths) so the GC check runs prior to the write and
prevents ENOSPC during restorePath.
- Around line 1170-1219: The code currently collects duplicates from pathsToCopy
into maybeToWrite/toWrite, allowing multiple workers for the same store path;
before moving items into maybeToWrite (or before building toWrite) deduplicate
by the canonical StorePath/real path: track a std::set<std::string> (or
std::set<std::filesystem::path>) of the key (e.g. printStorePath(info.path) or
toRealPath(info.path)) and only push a PathWithSource* into maybeToWrite/toWrite
the first time that key is seen; ensure pathsToLock insertion uses the same
canonical key so lock coverage matches queued work (references: pathsToCopy,
PathWithSource, maybeToWrite, pathsToLock, toWrite, toRealPath).
- Around line 1024-1026: The resume-trust check wrongly rejects files based on
group ownership; in the non-Windows block inside existsAndIsOwnedBySelf (or the
function performing the st->st_uid/st->st_gid checks) remove the st_gid
comparison so the condition becomes only check st->st_uid != geteuid(), i.e.
drop the "|| st->st_gid != getegid()" part; leave the surrounding `#ifndef` _WIN32
and the uid check intact so setgid-managed directories (build-users) won't cause
false negatives for .unpacked markers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ffa60980-e2a1-4f7c-bd98-009efaee57ff
📒 Files selected for processing (2)
src/libstore/include/nix/store/local-store.hhsrc/libstore/local-store.cc
When collecting garbage, delete a path's `.unpacked` marker before deleting the path itself. Deleting a directory is not atomic, so without this ordering an interrupted GC could leave the marker next to a partially-deleted path, causing addMultipleToStore() to later reuse the corrupt path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The `<realPath>.unpacked` marker filename was constructed in three places (the addMultipleToStore() reuse check and cleanup, and the GC deletion). Factor it into a single LocalStore::unpackedMarkerFor() helper. No functional change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
On some OSes the group could be inherited from the store directory, so it might be set to the build-users group, which is fine.
In doAddToStore(), call autoGC() right after deleting any pre-existing path and before restorePath(), rather than after the NAR has already been written. This lets the garbage collector free up space before the write, preventing ENOSPC during restorePath(). This matches the ordering already used by addToStoreFromDump(). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| if (!repair && existsAndIsOwnedBySelf(unpackedMarker) && existsAndIsOwnedBySelf(realPath)) { | ||
| notice("reusing previously unpacked path at '%s'", printStorePath(info.path)); | ||
| act.setExpected(actCopyPath, bytesExpected -= info.narSize); | ||
| } else { | ||
| doAddToStore(info, *item->second, repair); | ||
| /* The path has been unpacked, so mark it as such. */ | ||
| writeFile(unpackedMarker, ""); | ||
| } |
There was a problem hiding this comment.
I wonder if instead of using "file exists" as the sole validation check, we should record the fingerprint (and whatever other stuff), to ensure that we got the same bytes, NAR hash, ca (if applicable), etc? And then we decide validity based on "existence of file AND file contents (fingerprint) matches what we'd expect)"?
And then, in this "else" branch (i.e. "marker wasn't valid by having wrong contents or not existing"), we'd unconditionally remove the marker file (if it existed) and keep doing the thing.
There was a problem hiding this comment.
That would be a lot more expensive though. And it shouldn't be needed since nobody except GC can modify the store path.
Motivation
This speeds up
nix copyinto a local store.Previously copyPaths() relied on the default
addMultipleToStore(), which walks the closure in topological order callingaddToStore()per path. This means that we can't start copying a NAR into the store until all its dependencies have been added, in order to preserve the closure variant. This may waste available network/CPU capacity.So now we extract the NARs into the store first before registering the store paths as valid. Specifically:
We acquire locks on all store paths in sorted order (to avoid deadlocks).
We extract the NARs in order of descending NAR size (to minimize the makespan).
We register all new paths as valid in a single transaction.
This speeds up
from ~8.8s to ~5.4s. (Of course this depends a lot on network speed, I/O, CPU etc.)
Note: currently this doesn't benefit copying into a local store via the Nix daemon. (See
RemoteStore::addMultipleToStore()which sends NARs in topological order.) It also doesn't yet benefit substitution, since it currently substitutes one path at a time. (It should just rely oncopyPaths()to fetch entire missing closures.)Context
Summary by CodeRabbit