Skip to content

Parallelize dir-sync on graceful shutdowns#12996

Draft
zwoop wants to merge 2 commits intoapache:masterfrom
zwoop:ParallelDirSyncShutdown
Draft

Parallelize dir-sync on graceful shutdowns#12996
zwoop wants to merge 2 commits intoapache:masterfrom
zwoop:ParallelDirSyncShutdown

Conversation

@zwoop
Copy link
Contributor

@zwoop zwoop commented Mar 18, 2026

This will make normal ATS shutdowns come down quicker, specially on boxes with a lot of drives. This is similar to the new parallel sync options under normal operations, except this will always Parallelize as much as needed.

This will make normal ATS shutdowns come down quicker, specially on
boxes with a lot of drives. This is similar to the new parallel sync
options under normal operations, except this will always Parallelize
as much as needed.
@zwoop zwoop added this to the 11.0.0 milestone Mar 18, 2026
@zwoop zwoop requested a review from Copilot March 18, 2026 20:41
@zwoop zwoop self-assigned this Mar 18, 2026
@zwoop zwoop added the Cache label Mar 18, 2026
@zwoop
Copy link
Contributor Author

zwoop commented Mar 18, 2026

This is not tested, I'd like our team to try it first, but also looking for some AI lovin' reviews.

@zwoop zwoop marked this pull request as draft March 18, 2026 20:41
@zwoop zwoop requested a review from bneradt March 18, 2026 20:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce graceful shutdown time for Apache Traffic Server by parallelizing the cache directory flush that happens during shutdown, targeting one worker per physical cache disk.

Changes:

  • Build a disk→stripe index map for shutdown syncing.
  • Spawn one std::thread per disk to call StripeSM::shutdown() for that disk’s stripes.

You can also share your feedback on Copilot code review. Take the survey.

Fix lambda reference capture bug where all threads processed the last disk's
stripes instead of their own; fix shared 0xdeadbeef EThread* sentinel by using
a thread_local variable to give each OS thread a unique identity for
MUTEX_TAKE_LOCK.
@zwoop zwoop force-pushed the ParallelDirSyncShutdown branch from ff321da to 3cfb438 Compare March 18, 2026 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants