Skip to content

gh-135871: Reload lock internal state while spinning in PyMutex_LockTimed#146064

Open
dpdani wants to merge 1 commit intopython:mainfrom
dpdani:pymutex-spinning-reload
Open

gh-135871: Reload lock internal state while spinning in PyMutex_LockTimed#146064
dpdani wants to merge 1 commit intopython:mainfrom
dpdani:pymutex-spinning-reload

Conversation

@dpdani
Copy link
Contributor

@dpdani dpdani commented Mar 17, 2026

This PR adds atomic loads in the slow path of PyMutex to increase the number of lock acquisitions per second that threads can make on a shared mutex.

The tricky part here is to avoid degrading the performance when the lock is highly contended; that is, when many threads are trying to acquire the mutex at a high frequency. This is because the current strategy of never reloading the mutex's state is, maybe counter-intuitively, the best strategy: avoid disturbing the thread that currently holds the mutex.

I've run the lockbench script to assess the performance using the following two scenarios, which were suggested by @colesbury:

  • low contention:
./python.exe Tools/lockbench/lockbench.py --work-inside 5 --work-outside 50 --num-locks 24 --acquisitions 3 --random-locks
  • high contention:
./python.exe Tools/lockbench/lockbench.py --work-inside 5 --work-outside 5

The results are from my M4 MacBook, and Python was compiled with --disable-gil --enable-optimizations --with-lto.

The different lines below represent different reloading strategies:

  • old: main
  • unconditional: remove the if statement at line 107
  • RELOAD_SPIN_COUNT = ...: tweak the value at line 30

The RELOAD_SPIN_COUNT = ... strategies also pseudo-randomize the reload recurrence by having each thread add its own thread-id to the counter. On my machine this nets a ~15/20% improvement over the same strategy, without this pseudo-randomization.

lockbench_compare

And here's the comparison between main and this PR, which picks the RELOAD_SPIN_COUNT = 3 strategy:

lockbench_old_vs_spin_3

The fact that the high contention case shows as an improvement in this chart is misleading: I'd say it's in the noise range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant