diff --git a/doc/modules/ROOT/pages/why-not-tmc.adoc b/doc/modules/ROOT/pages/why-not-tmc.adoc index 30ebfa4b..d9d995df 100644 --- a/doc/modules/ROOT/pages/why-not-tmc.adoc +++ b/doc/modules/ROOT/pages/why-not-tmc.adoc @@ -1,4 +1,4 @@ -= Why Not TooManyCooks? += Capy and TooManyCooks: A Comparison You want to write async code in {cpp}. You've heard about coroutines. Two libraries exist: Capy and TooManyCooks (TMC). Both let you write `co_await`. Both run on multiple threads. @@ -18,6 +18,7 @@ One was designed for network I/O. The other was designed for compute tasks. Choo * Built for doing things (calculations, parallel work) * Multi-threaded work pool that keeps CPUs busy * Priority levels so important work runs first (16 of them, to be precise) +* Mid-coroutine executor switching for flexible work migration * No built-in I/O - you add that separately (via Asio integration) If you're building a network server, one of these is swimming upstream. @@ -30,17 +31,49 @@ When async code finishes waiting, it needs to resume somewhere. Where? *Capy's answer:* The same place it started. Automatically. -* Information flows forward through your code -* No global state, no thread-local magic +* Context flows forward through `await_suspend(h, ex, token)` parameters * Your coroutine started on executor X? It resumes on executor X. +* Child tasks can run on different executors via `run(other_ex)(child_task())` -*TMC's answer:* Wherever a worker thread picks it up. +*TMC's answer:* Where you tell it, with flexibility to change mid-execution. -* Thread-local variables track the current executor -* Works fine... until you cross boundaries -* Integrating external I/O requires careful coordination +* Context flows via `tmc::detail::awaitable_traits` - a traits-based injection mechanism +* Thread-local variables track the current executor for quick access +* Coroutines can hop executors mid-body via `resume_on()` and `enter()`/`exit()` +* Works fine within TMC's ecosystem; integrating external I/O requires the coordination headers (`ex_asio.hpp`, `aw_asio.hpp`) -TMC's Asio integration headers (`ex_asio.hpp`, `aw_asio.hpp`) exist because this coordination is non-trivial. +Both libraries propagate executor context. They differ in mechanism and mobility. + +== Executor Mobility + +TMC allows a coroutine to switch executors mid-body: + +[source,cpp] +---- +tmc::task example() { + // Running on executor A + co_await tmc::resume_on(executor_b); + // Now running on executor B - same coroutine! + + // Or scoped: + auto scope = co_await tmc::enter(io_exec); + // Temporarily on io_exec + co_await scope.exit(); + // Back to original +} +---- + +This is powerful for compute workloads where work can migrate between thread pools. + +*Capy's design choice:* Intentionally prevent mid-coroutine executor switching. A coroutine stays on its bound executor for its entire lifetime. Child tasks can run on different executors via `run(other_ex)(child_task())`, but the parent never moves. + +*Why Capy prevents this:* I/O objects often have invariants tied to their executor: + +* A socket may only be accessed from threads associated with a specific `io_context` +* File handles on Windows IOCP must complete on the same context they were initiated +* Timer state is executor-specific + +Allowing a coroutine holding I/O objects to hop executors mid-body would break these invariants. TMC doesn't face this constraint because it's a compute scheduler - work items don't carry I/O state with executor affinity. == Stopping Things @@ -55,8 +88,11 @@ What happens when you need to cancel an operation? *TMC:* You manage cancellation yourself. * Stop tokens exist in {cpp}20 but TMC doesn't propagate them automatically +* This is intentional: TMC is designed to work with various external libraries * Pending work completes, or you wait for it +The TMC author acknowledged that automatic cancellation propagation is an "excellent killer feature" for an integrated I/O stack like Capy. + == Keeping Things Orderly Both libraries support multi-threaded execution. Sometimes you need guarantees: "these operations must not overlap." @@ -77,6 +113,8 @@ Both libraries support multi-threaded execution. Sometimes you need guarantees: TMC's documentation describes this as "optimized for higher throughput with many serialized tasks." This is a design choice. Whether it matches your mental model is a separate question. +NOTE: Neither library prevents the caller from initiating multiple concurrent I/O operations on the same object - that's always the caller's responsibility. Both provide mutual exclusion for coroutine/handler execution only, not I/O operation queuing. + == Working with Data Network code moves bytes around. A lot of bytes. Efficiently. @@ -93,6 +131,32 @@ Network code moves bytes around. A lot of bytes. Efficiently. * Nothing. TMC is not an I/O library. * You use Asio's buffers through the integration layer. +== Memory Allocation Control + +HALO (Heap Allocation Lowering Optimization) lets compilers eliminate coroutine frame allocations when the frame's lifetime doesn't escape the caller. But I/O operations always escape - the awaitable must live until the kernel/reactor completes the operation. + +*Capy provides:* + +* Custom allocator propagation via `run_async(ex, allocator)` and `run(allocator)` +* Per-connection arena allocation +* Memory isolation between connections +* Instant reclamation on connection close + +[source,cpp] +---- +std::pmr::monotonic_buffer_resource arena; +run_async(ex, &arena)(handle_connection(socket)); +// On disconnect: entire arena reclaimed instantly +---- + +*TMC provides:* + +* Global `::operator new` (with cache-line padding) +* Recommends tcmalloc for improved performance +* No per-operation allocator control + +For I/O workloads where HALO cannot apply, allocator control is essential, not optional. + == Getting Technical: The IoAwaitable Protocol When you write `co_await something`, what happens? @@ -123,141 +187,179 @@ The awaitable receives: * `ex` - The executor (where to resume) * `token` - A stop token (for cancellation) -This is _forward propagation_. Context flows down the call chain, explicitly. - *TMC's approach:* -Standard signature. Context comes from thread-local storage: +Standard signature, plus traits-based context injection: -* `this_thread::executor` holds the current executor -* `this_thread::prio` holds the current priority -* Works within TMC's ecosystem -* Crossing to external systems requires the integration headers - -== Type Erasure +[source,cpp] +---- +// TMC propagates context via awaitable_traits +awaitable_traits::set_continuation(awaitable, continuation); +awaitable_traits::set_continuation_executor(awaitable, executor); +---- -*Capy:* +TMC also tracks `this_thread::executor` and `this_task.prio` in thread-local variables for quick access. -* `any_stream`, `any_read_stream`, `any_write_stream` -* Write a function taking `any_stream&` - it compiles once -* One virtual call per I/O operation -* Clean ABI boundaries +Both approaches achieve context propagation. Neither is compatible with arbitrary third-party awaitables without explicit support. -*TMC:* +== Protocol Strictness -* Traits-based: `executor_traits` specializations -* Type-erased executor: `ex_any` (function pointers, not virtuals) -* No stream abstractions (not an I/O library) +What happens when you `co_await` an awaitable that doesn't implement the extended protocol? -== Which Library Is More Fundamental? +*Capy:* Compile-time error. -A natural question: could one library be built on top of the other? The answer reveals which design is more fundamental. +[source,cpp] +---- +// From task.hpp transform_awaitable() +else +{ + static_assert(sizeof(A) == 0, "requires IoAwaitable"); +} +---- -=== The Standard {cpp}20 Awaitable Signature +*TMC:* Wrap in a trampoline that captures current context. [source,cpp] ---- -void await_suspend(std::coroutine_handle<> h); +// From task.hpp await_transform() +return tmc::detail::safe_wrap(std::forward(awaitable)); ---- -The awaitable receives only the coroutine handle. Nothing else. No information about where to resume, no cancellation mechanism. +*Trade-offs:* -=== Capy's IoAwaitable Protocol +[cols="1,1,1"] +|=== +| Aspect | Capy | TMC -From ``: +| Unknown awaitables +| Compilation failure +| `safe_wrap()` trampoline -[source,cpp] ----- -template -concept IoAwaitable = - requires(A a, coro h, executor_ref ex, std::stop_token token) - { - a.await_suspend(h, ex, token); - }; ----- +| Context propagation +| Required by protocol +| Lost for wrapped awaitables -The conforming signature: +| Integration flexibility +| Requires protocol adoption +| More permissive interop +|=== -[source,cpp] ----- -auto await_suspend(coro h, executor_ref ex, std::stop_token token); ----- +Capy makes the conscious decision that *silent degradation is worse than compilation failure*. If an awaitable doesn't carry context forward, the code doesn't compile. This prevents subtle bugs where cancellation or executor affinity silently stops working. -The awaitable receives: +TMC's approach is more flexible for incremental adoption but risks silent context loss when mixing TMC with non-TMC awaitables. + +== Integration Approaches + +[cols="1,1,1"] +|=== +| Aspect | TMC | Capy -* `h` - The coroutine handle (same as standard) -* `ex` - An `executor_ref` specifying where to resume -* `token` - A `std::stop_token` for cooperative cancellation +| External adapter +| Traits specialization (non-intrusive) +| Member function (intrusive) -This is _forward propagation_. Context flows explicitly through the call chain. +| Unknown awaitables +| `safe_wrap()` trampoline +| `static_assert` failure -=== TMC's Approach +| Context mechanism +| Traits + TLS capture +| Parameter passing +|=== -TMC uses the standard signature. Context comes from thread-local state: +Both require explicit support from awaitables. TMC's traits are external specializations, making it theoretically easier to build adapters for third-party libraries without modifying them. Capy's member function signature requires the awaitable itself to implement the protocol. -[source,cpp] ----- -// From TMC's thread_locals.hpp -inline bool exec_prio_is(ex_any const* const Executor, size_t const Priority) noexcept { - return Executor == executor && Priority == this_task.prio; -} ----- +Practically, both require cooperation from awaitable authors for full functionality. + +== I/O Performance: Native vs Integration + +TMC integrates with Asio via `aw_asio.hpp`/`ex_asio.hpp`. Corosio provides native I/O objects built on Capy's protocol. -TMC tracks `this_thread::executor` and `this_task.prio` in thread-local variables. When integrating with external I/O (Asio), the integration headers must carefully manage these thread-locals: +*TMC + Asio call chain for `socket.async_read_some(buf, tmc::aw_asio)`:* -[quote] -____ -"Sets `this_thread::executor` so TMC knows about this executor" +. `async_result::initiate()` - creates awaitable, stores initiation + args in `std::tuple` +. `operator co_await()` returns `aw_asio_impl` +. `await_suspend()` calls `async_initiate()` -> `initiate_await(callback)` - virtual call +. `std::apply` unpacks tuple, invokes Asio initiation +. Asio type-erases handler into internal storage +. On completion: callback stores result, calls `resume_continuation()` +. `resume_continuation()` checks executor/priority, posts if different -— TMC documentation on `ex_asio` -____ +*Corosio native call chain for `socket.read_some(buf)`:* -=== The Asymmetry +. Returns `read_some_awaitable` (stack object) +. `await_suspend(h, ex, token)` calls `impl_.read_some()` - virtual call to platform impl +. Platform impl issues direct syscall (`recv`/`WSARecv`) +. Registers with reactor +. On completion: `ex.dispatch(h)` - inline resume when on io_context executor -Capy's signature carries strictly _more information_ than the standard signature. +*Overhead comparison:* [cols="1,1,1"] |=== -| Information | Standard {cpp}20 | Capy IoAwaitable +| Aspect | TMC + Asio | Corosio Native -| Coroutine handle -| Yes -| Yes +| Virtual calls +| 1 (`initiate_await`) +| 1 (platform impl) + +| Type erasure +| Asio handler + `ex_any` +| `executor_ref` only -| Executor +| Tuple packing +| Yes (init args) | No -| Yes (`executor_ref`) -| Stop token +| Handler storage +| Asio internal (likely heap) +| Operation slot in socket + +| Completion dispatch +| Checks executor/priority, posts if different +| `dispatch()` call, inline resume on io_context + +| Lambda wrapper +| Yes (`ex_asio::post`) | No -| Yes (`std::stop_token`) |=== -=== Can TMC's abstractions be built on Capy's protocol? +The critical path difference is completion. TMC+Asio goes through `resume_continuation()` which checks executor/priority and often posts via `asio::post()`. Corosio's `dispatch()` can resume the coroutine inline when already on the io_context executor, avoiding the post overhead. + +== Type Erasure + +*Capy:* -Yes. You would: +* `any_stream`, `any_read_stream`, `any_write_stream` +* Write a function taking `any_stream&` - it compiles once +* One virtual call per I/O operation +* Clean ABI boundaries -. Receive `executor_ref` and `stop_token` from Capy's `await_suspend` -. Store them in thread-local variables (as TMC does now) -. Implement work-stealing executors that satisfy Capy's executor concept -. Ignore the stop token if you prefer manual cancellation +*TMC:* -You can always _discard_ information you don't need. +* Traits-based: `executor_traits` specializations +* Type-erased executor: `ex_any` (function pointers, not virtuals) +* No stream abstractions (not an I/O library) -=== Can Capy's protocol be built on TMC's? +== Different Positions in the Tree of Need -No. TMC's `await_suspend` does not receive executor or stop token. To obtain them, you would need to: +TMC and Capy occupy different architectural positions. Rather than competing, they serve different needs: -* Query thread-local state (violating Capy's explicit-flow design) -* Or query the caller's promise type (tight coupling Capy avoids) +*TMC sits above I/O:* -You cannot _conjure_ information that was never passed. +* Compute scheduler designed for CPU-bound parallel work +* Integrates with existing I/O solutions (Asio) +* Flexible executor mobility for work migration +* Permissive interop via `safe_wrap()` for gradual adoption -=== Conclusion +*Capy sits below compute:* -Capy's IoAwaitable protocol is a _superset_ of the standard protocol. TMC's work-stealing scheduler, priority levels, and `ex_braid` are executor _implementations_ - they could implement Capy's executor concept. But Capy's forward-propagation semantics cannot be retrofitted onto a protocol that doesn't carry the context. +* I/O foundation designed for network/file operations +* Strict protocol enforcement prevents silent failures +* Executor stability protects I/O object invariants +* Allocator control where HALO cannot apply -Capy is the more fundamental library. +Neither is "more fundamental." If you're building a network server, Capy's constraints exist to protect you. If you're parallelizing CPU work, TMC's flexibility is valuable. == Corosio: Proof It Works @@ -275,19 +377,24 @@ All built on Capy's IoAwaitable protocol. Coroutines only. No callbacks. *Choose TMC if:* * CPU-bound parallel algorithms -* Compute workloads needing TMC's specific priority model (1-16 levels) -* Work-stealing benefits your access patterns +* Compute workloads needing work-stealing or priority scheduling (1-16 levels) +* Work that benefits from mid-coroutine executor migration * You're already using Asio and want a scheduler on top +* Gradual adoption with mixed awaitable sources *Choose Capy if:* * Network servers or clients * Protocol implementations * I/O-bound workloads -* You want cancellation that propagates +* You want cancellation that propagates automatically * You want buffers and streams as first-class concepts -* You prefer explicit context flow over thread-local state -* You want to implement your own executor (Capy uses concepts, not concrete types) +* You need per-connection allocator control +* You prefer strict compile-time protocol enforcement + +*Or use both:* + +TMC for compute scheduling, Capy/Corosio for I/O. They can coexist at different layers of your application. == Summary @@ -303,18 +410,30 @@ All built on Capy's IoAwaitable protocol. Coroutines only. No callbacks. | Multi-threaded (`thread_pool`) | Multi-threaded (work-stealing) +| Executor mobility +| Fixed per coroutine +| Mid-body switching (`resume_on`) + | Serialization | `strand` (ordering preserved across suspend) | `ex_braid` (lock released on suspend) | Context propagation -| Forward (IoAwaitable protocol) -| Thread-local state +| `await_suspend` parameters +| `awaitable_traits` + TLS + +| Unknown awaitables +| `static_assert` failure +| `safe_wrap()` trampoline | Cancellation | Automatic propagation | Manual +| Allocator control +| Per-task (`std::pmr`) +| Global (`::operator new`) + | Buffer sequences | Yes | No (use Asio) @@ -332,8 +451,8 @@ All built on Capy's IoAwaitable protocol. Coroutines only. No callbacks. | Via Asio integration headers | Priority scheduling -| Implement your own (24 levels, if you wish) -| Yes (1-16 levels) +| Implement your own +| Built-in (1-16 levels) | Work-stealing | No @@ -343,3 +462,13 @@ All built on Capy's IoAwaitable protocol. Coroutines only. No callbacks. | Concept-based (user-extensible) | Traits-based (`executor_traits`) |=== + +== Revision History + +[cols="1,3"] +|=== +| Date | Changes + +| 2026-02-04 +| Revised to correct inaccuracies regarding TMC's context propagation mechanism. The author of TooManyCooks provided feedback clarifying that TMC implements executor affinity via `tmc::detail::awaitable_traits`, not just thread-local state. Reframed comparison to acknowledge both libraries as complementary solutions for different architectural positions rather than competitors. +|=== diff --git a/include/boost/capy/detail/type_id.hpp b/include/boost/capy/detail/type_id.hpp index dcf2c110..437bd4e6 100644 --- a/include/boost/capy/detail/type_id.hpp +++ b/include/boost/capy/detail/type_id.hpp @@ -209,7 +209,7 @@ using type_index = std::type_index; /// Returns type_info for type T template -inline type_info const& +inline constexpr type_info const& type_id() noexcept { return typeid(T); diff --git a/include/boost/capy/ex/executor_ref.hpp b/include/boost/capy/ex/executor_ref.hpp index 6cda2a64..8001f0d1 100644 --- a/include/boost/capy/ex/executor_ref.hpp +++ b/include/boost/capy/ex/executor_ref.hpp @@ -11,6 +11,7 @@ #define BOOST_CAPY_EXECUTOR_REF_HPP #include +#include #include #include @@ -34,6 +35,7 @@ struct executor_vtable void (*post)(void const*, std::coroutine_handle<>); void (*dispatch)(void const*, std::coroutine_handle<>); bool (*equals)(void const*, void const*) noexcept; + detail::type_info const* type_id; }; /** Vtable instance for a specific executor type. */ @@ -62,7 +64,9 @@ inline constexpr executor_vtable vtable_for = { // equals [](void const* a, void const* b) noexcept -> bool { return *static_cast(a) == *static_cast(b); - } + }, + // type_id + &detail::type_id() }; } // detail @@ -244,6 +248,17 @@ class executor_ref return false; return vt_->equals(ex_, other.ex_); } + + /** Returns the type info of the underlying executor type. + + @return A reference to the type_info for the wrapped executor. + + @pre This instance was constructed with a valid executor. + */ + detail::type_info const& type_id() const noexcept + { + return *vt_->type_id; + } }; } // capy diff --git a/include/boost/capy/when_all.hpp b/include/boost/capy/when_all.hpp index 6ba4b920..3129091d 100644 --- a/include/boost/capy/when_all.hpp +++ b/include/boost/capy/when_all.hpp @@ -113,12 +113,7 @@ struct when_all_state { } - ~when_all_state() - { - for(auto h : runner_handles_) - if(h) - h.destroy(); - } + // Runners self-destruct in final_suspend. No destruction needed here. /** Capture an exception (first one wins). */ @@ -130,20 +125,6 @@ struct when_all_state first_exception_ = ep; } - /** Signal that a task has completed. - - The last child to complete triggers resumption of the parent. - Dispatch handles thread affinity: resumes inline if on same - thread, otherwise posts to the caller's executor. - */ - coro signal_completion() - { - auto remaining = remaining_count_.fetch_sub(1, std::memory_order_acq_rel); - if(remaining == 1) - caller_ex_.dispatch(continuation_); - return std::noop_coroutine(); - } - }; /** Wrapper coroutine that intercepts task completion. @@ -181,10 +162,25 @@ struct when_all_runner return false; } - coro await_suspend(coro) noexcept + void await_suspend(coro h) noexcept { - // Signal completion; last task resumes parent - return p_->state_->signal_completion(); + // Extract everything needed for signaling before + // self-destruction. Inline dispatch may destroy + // when_all_state, so we can't access members after. + auto* state = p_->state_; + auto* counter = &state->remaining_count_; + auto caller_ex = state->caller_ex_; + auto cont = state->continuation_; + + // Self-destruct first - state no longer destroys runners + h.destroy(); + + // Signal completion. If last, dispatch parent. + // Uses only local copies - safe even if state + // is destroyed during inline dispatch. + auto remaining = counter->fetch_sub(1, std::memory_order_acq_rel); + if(remaining == 1) + caller_ex.dispatch(cont); } void await_resume() const noexcept diff --git a/test/unit/ex/executor_ref.cpp b/test/unit/ex/executor_ref.cpp index d56311ce..37bdd547 100644 --- a/test/unit/ex/executor_ref.cpp +++ b/test/unit/ex/executor_ref.cpp @@ -12,6 +12,7 @@ #include #include +#include #include "test_suite.hpp" @@ -235,6 +236,26 @@ struct executor_ref_test BOOST_TEST_EQ(counter.load(), N); } + void + testTypeId() + { + thread_pool pool1(1); + thread_pool pool2(1); + auto executor1 = pool1.get_executor(); + auto executor2 = pool2.get_executor(); + + executor_ref ex1(executor1); + executor_ref ex2(executor2); + + // Same executor type returns equal type_info + BOOST_TEST(ex1.type_id() == ex2.type_id()); + + // Different executor type returns different type_info + test::inline_executor ie; + executor_ref ex3(ie); + BOOST_TEST(ex1.type_id() != ex3.type_id()); + } + void run() { @@ -244,6 +265,7 @@ struct executor_ref_test testDispatch(); testPost(); testMultiplePost(); + testTypeId(); } };