fix: stagger estate push to stop CI thundering-herd by hyperpolymath · Pull Request #147 · hyperpolymath/gitbot-fleet

hyperpolymath · 2026-05-16T14:34:35Z

Root cause

scripts/sync-all-parallel.exs Phase 1 (phase1_parallel) fetches/pulls/pushes
every owned repo through a single Task.async_stream at --concurrency (default 32),
with no pacing. When the whole estate (~355 repos) is synced in one tight window,
each git push immediately triggers that repo's GitHub Actions workflows. The
resulting near-simultaneous burst of thousands of workflow runs saturates the
account-wide hosted-runner concurrency cap, so a large fraction of estate CI
(root-caused at ~34%) sits transiently queued — a classic CI thundering herd.

Nothing about what is pushed is wrong; the problem is purely the dispatch
timing (all pushes effectively at once).

Change — staggered batch pacing

Least-invasive fix that fits the existing BEAM design: keep Task.async_stream
and its intra-batch --concurrency parallelism unchanged, but process repos in
batches with a paced pause between batches, so CI trigger waves are
spread over time instead of arriving all at once.

phase1_parallel now chunk_everys repos into batches and runs each batch via
the unchanged run_batch/4 (same concurrency, same per-repo error handling).
A pause is inserted after every batch except the last: batch_pause_sec
plus a small random jitter (0–5s) to de-correlate repeated runs.
Correctness preserved: every repo is still processed exactly as before;
a crash/timeout in one repo still maps to an error result and never aborts
the batch or the run; idempotent (re-running pushes nothing new).

Defaults & tuning

Setting	Default	Flag	Env
Batch size	25	`--batch-size N`	`SYNC_BATCH_SIZE`
Inter-batch pause	45s (+0–5s jitter)	`--batch-pause SEC`	`SYNC_BATCH_PAUSE_SEC`
Disable pacing	on	`--no-throttle`	—

25 repos / ~45s caps the workflow-trigger rate well under the account runner
cap while still syncing the full estate in a bounded time (~355 repos ≈ 14
batches ≈ ~10–11 min of added pacing). Tune up --batch-pause or down
--batch-size if the queue still backs up; raise the batch size once the
runner cap is increased. CLI flags override env vars.

Throttling is automatically skipped under --dry-run (no pushes occur, so
there is nothing to pace) and via explicit --no-throttle (legacy single-stream
behaviour).

Pairing

This addresses the producer side (spreading the push/trigger burst). It pairs
with the template concurrency: PR, which addresses the consumer side
(per-repo workflow auto-cancellation / serialization). Both are needed to fully
eliminate the queued-CI saturation.

🤖 Generated with Claude Code

Phase 1 of sync-all-parallel.exs pushed every owned repo via a single Task.async_stream with no pacing, so a full-estate sync fired thousands of GitHub Actions runs near-simultaneously and saturated the account hosted-runner concurrency cap (~34% of estate CI left transiently queued). Process repos in batches (default 25, --batch-size / SYNC_BATCH_SIZE) with a paced pause between batches (default 45s + 0-5s jitter, --batch-pause / SYNC_BATCH_PAUSE_SEC). Intra-batch concurrency and per-repo error handling are unchanged; every repo is still processed. Pacing is skipped on --dry-run and --no-throttle. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

hyperpolymath merged commit 4ea09a7 into main May 16, 2026
21 of 26 checks passed

hyperpolymath deleted the fix/throttle-estate-push-herd branch May 16, 2026 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stagger estate push to stop CI thundering-herd#147

fix: stagger estate push to stop CI thundering-herd#147
hyperpolymath merged 1 commit into
mainfrom
fix/throttle-estate-push-herd

hyperpolymath commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hyperpolymath commented May 16, 2026

Root cause

Change — staggered batch pacing

Defaults & tuning

Pairing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant