Skip to content

scripts: add script to clean up stale bazel output bases#166331

Merged
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
dt:output-mgmt
Mar 24, 2026
Merged

scripts: add script to clean up stale bazel output bases#166331
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
dt:output-mgmt

Conversation

@dt
Copy link
Contributor

@dt dt commented Mar 20, 2026

Bazel output bases for workspaces that no longer exist (e.g. deleted git worktrees) or that haven't been used recently can waste significant disk space. An idle workspace typically accumulates enough changes that most of its cached build is invalidated on next use, so the output base isn't worth keeping. Pruning orphaned and stale bases reclaims this space with limited downside, since rebuilds still leverage the remote LRU cache maintained by dev.

Additionally, when using multiple worktrees with similar content, the same build artifacts are often duplicated across output bases. These duplicates can be replaced with copy-on-write links using fclones. The script supports this on an opt-in basis: create the directory ~/.cache/bazel-tidy/dedupe and ensure fclones is available on PATH.

Release note: none.
Epic: none.

@dt dt requested review from michae2 and stevendanna March 20, 2026 19:03
@trunk-io
Copy link
Contributor

trunk-io bot commented Mar 20, 2026

😎 Merged successfully - details.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I'm going to try this out! Thanks for creating it! :lgtm:

@michae2 reviewed all commit messages and made 4 comments.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on dt and stevendanna).


scripts/tidy_bazel_bases.sh line 120 at r1 (raw file):

  # Opt-in: dedup only runs if the user has created ~/.cache/bazel-tidy/dedupe
  # and fclones is installed.
  if ! command -v fclones >/dev/null 2>&1 || [ ! -d "${HOME}/.cache/bazel-tidy/dedupe" ]; then

Based on the comment below, is the special patched fclones needed to make this work? Should we check for thats specific version of fclones?


scripts/tidy_bazel_bases.sh line 140 at r1 (raw file):

    return
  fi
  

nit: trailing spaces


scripts/tidy_bazel_bases.sh line 163 at r1 (raw file):

  # bazel binaries and clear the output bases so it stops propagating to output
  # files where it becomes uncopyable, but this is less ideal as it means that
  # provenance information is no longer tracked. dt has a patched fclones. 

nit: trailing space

@michae2
Copy link
Collaborator

michae2 commented Mar 20, 2026

A couple more notes from Claude, which I don't feel too strongly about, but might be worth considering.


Additional Findings from Correctness Review

TOCTOU race in maybe_remove_base (lines 49-62) — most significant concern

Between checking the .inuse PID file and running rm -rf, another process can run claim_output_base.sh and start using that base. The existing claim_output_base.sh uses set -o noclobber for atomic PID creation, but the tidy script's check-then-delete isn't coordinated with that. A new bazel invocation could mkdir -p the base and create .inuse while rm -rf is still deleting files in the same tree. This could cause confusing build failures. Worth at least documenting why this is acceptable, or re-checking .inuse after the removal.

PID file locking should match claim_output_base.sh pattern

The existing claim_output_base.sh in the repo uses set -o noclobber for atomic PID file creation. The new script's check-then-write pattern is weaker. Consider using the same noclobber pattern for consistency.

Dedup runs on actively-used bases

dedupe_base_contents doesn't filter out bases with live .inuse PIDs. While APFS reflinks are COW and fclones uses rename-based replacement, replacing files while bazel is actively reading could be surprising. Worth either filtering active bases or documenting why this is safe.

@dt dt force-pushed the output-mgmt branch 2 times, most recently from fd18f87 to 94966d8 Compare March 21, 2026 21:01
@tbg
Copy link
Member

tbg commented Mar 23, 2026

  • didn't review in detail
  • thank you for doing this 🙏🏽 🙏🏽
  • I get lots of rm: cannot remove '/private/var/tmp/_bazel_tbg/1325b385b9b0c87cc4cb4990ffe0f6d4/execroot/com_github_cockroachdb_cockroach/bazel-out/darwin_arm64-fastbuild/bin/pkg/server/status/status_/status.a.cgo/_cgo_imports.go': Permission denied probably because the files/dirs are read-only by default. chmod -R u+w /private/var/tmp/_bazel_tbg/ fixed this but ideally the script would take care of the perms.
  • [2026-03-23 10:43:02.601] fclones: info: Processed 131017 files and reclaimed up to 3.8 GB space
  • my _bazel_tbg remains at 80G
  • STALE_BASE_DAYS=5 ./scripts/tidy_bazel_bases.sh: exactly same output

@dt
Copy link
Contributor Author

dt commented Mar 23, 2026

rm: cannot remove

Ugh. These bazel read-only permissions are so annoying. That chmod -R is pretty expensive (particularly on a machine where every IO/syscall gets extra blocking inspections) just to nuke it, but I don't want to sudo either which would make this tedious to run non-interactively. I'll add the chmod for now but I wonder if we can tweak bazel to chill out with the u-w on dirs.

@dt
Copy link
Contributor Author

dt commented Mar 23, 2026

rm: cannot remove

Added a chmod +w before the rm. That chmod is a bit expensive/slow (more so on on corp macs), so I'll be setting experimental_writable_outputs in my .bazelrc.user and added a probe to skip it if we see dirs are already writable (using the go_sdk dir as its indicator since it is most predictable to exist).

@dt
Copy link
Contributor Author

dt commented Mar 23, 2026

STALE_BASE_DAYS=5

it previously wasn't using assign-if-empty so wasn't customizable at invocation. updated now.

Bazel output bases for workspaces that no longer exist (e.g. deleted git
worktrees) or that haven't been used recently can waste significant disk
space. An idle workspace typically accumulates enough changes that most
of its cached build is invalidated on next use, so the output base isn't
worth keeping. Pruning orphaned and stale bases reclaims this space with
limited downside, since rebuilds still leverage the remote LRU cache
maintained by dev.

Additionally, when using multiple worktrees with similar content, the
same build artifacts are often duplicated across output bases. These
duplicates can be replaced with copy-on-write links using `fclones`.
The script supports this on an opt-in basis: create the directory
~/.cache/bazel-tidy/dedupe and ensure `fclones` is available on PATH.

Release note: none.
Epic: none.
@dt
Copy link
Contributor Author

dt commented Mar 23, 2026

TFTR!

/trunk merge

@dt
Copy link
Contributor Author

dt commented Mar 24, 2026

/trunk merge

@trunk-io trunk-io bot merged commit 2472e4e into cockroachdb:master Mar 24, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants