kqueue: elide mutex and atomic barriers in single-threaded mode#139
kqueue: elide mutex and atomic barriers in single-threaded mode#139mvandeberg wants to merge 1 commit intocppalliance:developfrom
Conversation
When concurrency_hint == 1, the scheduler now bypasses all synchronization overhead. conditional_mutex/conditional_unique_lock skip pthread_mutex calls entirely, conditional_atomic decomposes RMW ops (fetch_add/fetch_sub/exchange/CAS) into plain relaxed load+modify+store — eliminating ldaxr/stlxr exclusive pairs on ARM64 and LOCK prefixes on x86. Mutexes spin briefly (40 iterations with yield/pause hints) before falling back to the OS futex. Benchmark factory updated to accept a concurrency hint so single-threaded benchmarks explicitly opt in with factory(1).
📝 WalkthroughWalkthroughIntroduced conditional synchronization primitives ( Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #139 +/- ##
===========================================
+ Coverage 81.03% 81.20% +0.17%
===========================================
Files 64 64
Lines 5710 5710
===========================================
+ Hits 4627 4637 +10
+ Misses 1083 1073 -10 see 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
An automated preview of the documentation is available at https://139.corosio.prtest3.cppalliance.org/index.html If more commits are pushed to the pull request, the docs will rebuild at the same URL. 2026-02-13 00:25:40 UTC |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/corosio/src/detail/conditional_mutex.hpp`:
- Around line 35-42: The spin_pause() implementation uses GCC/Clang-only
intrinsics and will not compile under MSVC; update spin_pause() to add
MSVC-specific branches: include <intrin.h> when _MSC_VER is defined, use
__yield() for ARM64 (_M_ARM64) and _mm_pause() for x86 (_M_X64, _M_IX86) instead
of __asm__ volatile and __builtin_ia32_pause(), and keep the existing GCC/Clang
branches for non-MSVC compilers; ensure all compiler/arch checks are properly
`#ifdef-guarded` so the header remains portable.
🧹 Nitpick comments (4)
src/corosio/src/detail/conditional_mutex.hpp (1)
149-153:conditional_event::wait()is silently a no-op when locking is disabled — document the busy-spin hazard.When disabled,
lock.underlying().owns_lock()isfalse, sowait()returns immediately without blocking. Any caller that loops on a condition usingcond_.wait(lock)(e.g.,wait_for_signal) would degrade into an infinite busy-spin consuming 100% CPU.Currently this is safe because
wait_for_signalis structurally unreachable in single-threaded mode (the reactor sentinel ensures the queue is never empty). However, this is a non-obvious invariant — a future refactor that breaks that invariant would silently introduce a livelock. Consider adding an assertion or a brief comment at the call site to document why this is safe.src/corosio/src/detail/kqueue/scheduler.hpp (1)
206-208: Note: kqueue scheduler now diverges from the epoll scheduler's locking interface.The epoll scheduler still uses
std::unique_lock<std::mutex>&(persrc/corosio/src/detail/epoll/scheduler.hpp:172). If the single-threaded optimization proves successful, consider applying the same treatment to the epoll backend to keep the implementations aligned.src/corosio/src/detail/kqueue/scheduler.cpp (2)
691-697: Verify descriptor mutex is never used concurrently beforeregister_descriptorconfigures it.The
conditional_mutexindescriptor_stateis default-constructed withenabled = true. Lines 691–692 then reconfigure it to match the scheduler's settings. Between construction and this point, the mutex is in a fully-locked (enabled) mode regardless ofconcurrency_hint. This is safe becauseregister_descriptoris called during socket setup before any I/O events can fire, but it would be slightly more robust to constructdescriptor_state::mutexin the disabled state or accept the enabled flag at construction time to avoid any window of incorrect configuration.
843-867:wait_for_signal/wait_for_signal_forsilently degrade in single-threaded mode.When locking is disabled,
cond_.wait(lock)is a no-op (the underlyingstd::unique_lockdoesn't own the mutex).wait_for_signalbecomes an infinite busy-spin, andwait_for_signal_forbecomes a single non-blocking check — effectivelypoll_one()semantics regardless of the requested timeout.As currently designed, the reactor sentinel in the completed-ops queue prevents these paths from being reached in single-threaded mode. However, this invariant is non-obvious. A defensive assertion would catch any future regression:
Defensive assertion
void kqueue_scheduler:: wait_for_signal(conditional_unique_lock& lock) const { + // Unreachable in single-threaded mode: the reactor sentinel + // ensures the queue is never empty when outstanding_work > 0. + assert(locking_enabled() && "wait_for_signal reached in single-threaded mode"); while ((state_ & signaled_bit) == 0) {
| inline void spin_pause() noexcept | ||
| { | ||
| #if defined(__aarch64__) || defined(_M_ARM64) | ||
| __asm__ volatile("yield"); | ||
| #elif defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || defined(_M_IX86) | ||
| __builtin_ia32_pause(); | ||
| #endif | ||
| } |
There was a problem hiding this comment.
spin_pause() will fail to compile under MSVC due to GCC/Clang-only intrinsics.
The MSVC-specific macros (_M_ARM64, _M_X64, _M_IX86) are detected, but the code paths use __asm__ volatile and __builtin_ia32_pause() which are GCC/Clang intrinsics. MSVC requires __yield() (ARM64) and _mm_pause() (x86) from <intrin.h>. Since this header is in the shared detail/ directory (not under kqueue/), it should be portable in case other backends adopt it.
Suggested portable fix
inline void spin_pause() noexcept
{
-#if defined(__aarch64__) || defined(_M_ARM64)
+#if defined(__aarch64__)
__asm__ volatile("yield");
-#elif defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || defined(_M_IX86)
+#elif defined(_M_ARM64)
+ __yield();
+#elif defined(__x86_64__) || defined(__i386__)
__builtin_ia32_pause();
+#elif defined(_M_X64) || defined(_M_IX86)
+ _mm_pause();
`#endif`
}The MSVC paths would also need #include <intrin.h> guarded by #ifdef _MSC_VER.
🤖 Prompt for AI Agents
In `@src/corosio/src/detail/conditional_mutex.hpp` around lines 35 - 42, The
spin_pause() implementation uses GCC/Clang-only intrinsics and will not compile
under MSVC; update spin_pause() to add MSVC-specific branches: include
<intrin.h> when _MSC_VER is defined, use __yield() for ARM64 (_M_ARM64) and
_mm_pause() for x86 (_M_X64, _M_IX86) instead of __asm__ volatile and
__builtin_ia32_pause(), and keep the existing GCC/Clang branches for non-MSVC
compilers; ensure all compiler/arch checks are properly `#ifdef-guarded` so the
header remains portable.
|
GCOVR code coverage report https://139.corosio.prtest3.cppalliance.org/gcovr/index.html Build time: 2026-02-13 00:33:13 UTC |
|
We are going to implement this at compile time. Closing. |
When concurrency_hint == 1, the scheduler now bypasses all synchronization overhead. conditional_mutex/conditional_unique_lock skip pthread_mutex calls entirely, conditional_atomic decomposes RMW ops (fetch_add/fetch_sub/exchange/CAS) into plain relaxed load+modify+store — eliminating ldaxr/stlxr exclusive pairs on ARM64 and LOCK prefixes on x86. Mutexes spin briefly (40 iterations with yield/pause hints) before falling back to the OS futex.
Benchmark factory updated to accept a concurrency hint so single-threaded benchmarks explicitly opt in with factory(1).
Summary by CodeRabbit
Release Notes
New Features
Refactor