From 7ccf178e97570af6589d353731935b4f701d047e Mon Sep 17 00:00:00 2001
From: Vinnie Falco <vinnie.falco@gmail.com>
Date: Sat, 31 Jan 2026 22:16:55 -0800
Subject: [PATCH 1/3] Consolidate documentation reference material

---
 .cursor/rules/writing-guide.mdc |  145 ++
 doc/reference/concurrency-1.md  | 2322 +++++++++++++++++++++++++++++++
 doc/reference/concurrency-2.md  |  805 +++++++++++
 doc/reference/coro-tutorial.md  | 1425 +++++++++++++++++++
 4 files changed, 4697 insertions(+)
 create mode 100644 .cursor/rules/writing-guide.mdc
 create mode 100644 doc/reference/concurrency-1.md
 create mode 100644 doc/reference/concurrency-2.md
 create mode 100644 doc/reference/coro-tutorial.md

diff --git a/.cursor/rules/writing-guide.mdc b/.cursor/rules/writing-guide.mdc
new file mode 100644
index 00000000..bbc09a6a
--- /dev/null
+++ b/.cursor/rules/writing-guide.mdc
@@ -0,0 +1,145 @@
+---
+description: Documentation style guide for Capy docs
+globs:
+  - "doc/**"
+  - "**/*.adoc"
+---
+
+# Style
+
+Technical documentation should be:
+
+- **Comprehensive and written for all experience levels**
+- **Technically detailed and correct**
+- **Practical, useful, and self-contained**
+- **Friendly but formal**
+
+## Comprehensive and Written for All Experience Levels
+
+Write clearly without assuming background knowledge. Provide explanations and context readers need to understand concepts, not just copy code.
+
+Avoid words like "simple," "straightforward," "easy," "simply," "obviously," and "just." These make assumptions about the reader's knowledge. A reader who hears something is "easy" may be frustrated when they encounter an issue.
+
+## Technically Detailed and Correct
+
+Don't provide blocks of code and ask readers to trust it works. Every command should have a detailed explanation. Every block of code should be followed by prose explaining what it does and why.
+
+When asking the reader to execute a command or modify code, first explain what it does and why. These details help readers grow their skills.
+
+## Practical and Self-Contained
+
+Readers should have something usable when finished. Link to prerequisites they should complete first. Link to other docs for additional information. Only send readers offsite if no existing doc covers it and the information can't be summarized.
+
+## Friendly but Formal
+
+No jargon, memes, excessive slang, emoji, or jokes. Aim for a tone that works across language and cultural boundaries.
+
+Use second person ("You will configure...") to keep focus on the reader. In some cases, use first person plural ("We will examine..."). Avoid first person singular ("I think...").
+
+Use motivational language focused on outcomes. Instead of "You will learn how to install Apache," try "In this tutorial, you will install Apache."
+
+# Structure
+
+## Introduction
+
+Usually one to three paragraphs. Answer:
+
+- What is this about? What does each component do (briefly)?
+- Why should the reader learn this? What are the benefits?
+- What will the reader do or create? Be specific.
+- What will they have accomplished when done? What new skills?
+
+Keep focus on the reader and what they will accomplish. Instead of "we will learn how to," use "you will configure" or "you will build."
+
+## Prerequisites
+
+Spell out exactly what the reader should have or do before starting. Format as a checklist. Link to existing docs covering prerequisite content.
+
+Be specific. "Familiarity with JavaScript" without a link gives little context. Instead: "Familiarity with JavaScript. To build your skills, check out [resource]."
+
+## Steps
+
+Each step describes what the reader needs to do and why. Include commands, code listings, and explanations of both what to do and why.
+
+Step titles describe what readers will accomplish using gerunds (-ing words):
+
+> Step 1 — Creating User Accounts
+
+After the title, add an introductory sentence describing what the reader will do and how it contributes to the overall goal.
+
+### Commands
+
+All commands go on their own line in a code block. Precede with a description of what the command does. After the command, explain arguments and why they're used:
+
+> Execute the following command to display the contents of the directory, including hidden files:
+>
+> `ls -al /home/sammy`
+>
+> The `-a` switch shows all files including hidden ones, and `-l` shows a long listing with timestamps and sizes.
+
+Display command output in a separate block with text explaining what it shows.
+
+### Code Blocks
+
+Introduce code with a high-level explanation of what it does. Show the code. Then call out important details:
+
+> Add the following code, which prints a message to the screen:
+>
+> ```cpp
+> std::cout << "Hello world!\n";
+> ```
+>
+> The `std::cout` stream sends text to standard output.
+
+When changing something specific in existing code, show the relevant parts and highlight what should change. Explain what the change does and why it's necessary.
+
+### Transitions
+
+Frame each step with a brief intro sentence and a closing transition describing what the reader accomplished and where they're going next. Vary the language to avoid repetition:
+
+> You have now configured the server. Before proceeding, you need to verify the settings in the next step.
+
+## Conclusion
+
+Summarize what the reader accomplished. Instead of "we learned how to," use "you configured" or "you built."
+
+Describe what the reader can do next: use cases, features to explore, links to related docs.
+
+# Formatting
+
+## Line-level
+
+**Bold** for:
+- Visible GUI text
+- Hostnames and usernames
+- Term lists
+- Emphasis when changing context
+
+*Italics* only for introducing technical terms.
+
+`Inline code` for:
+- Command names
+- Package names
+- File names and paths
+- Example URLs
+- Ports
+- Key presses (ALL CAPS, use + for simultaneous: `CTRL+C`)
+
+## Code Blocks
+
+Use for:
+- Commands to execute
+- Files and scripts
+- Terminal output
+
+Use ellipses (`...`) to indicate excerpts and omissions.
+
+If most of a file can be left with defaults, show just the section that needs changing.
+
+## Variables
+
+Highlight items the reader must change: example URLs, version numbers, modified lines. Make clear what needs customization.
+
+## Notes and Warnings
+
+Use note and warning callouts for very important information.
diff --git a/doc/reference/concurrency-1.md b/doc/reference/concurrency-1.md
new file mode 100644
index 00000000..dfaafa6c
--- /dev/null
+++ b/doc/reference/concurrency-1.md
@@ -0,0 +1,2322 @@
+# Advanced C++ Crash Course (Threading and Concurrency)
+
+Author: methylDragon  
+Contains an advanced syntax reference for C++  
+This time, we'll be going through C++ multithreading and concurrency related stuff!    
+
+------
+
+## Pre-Requisites
+
+**Assumed knowledge (This is a C++ crash course, not a basic coding tutorial)**
+
+- How **variables, loops, conditionals, etc**. work (Basic coding fundamentals will help a lot!)
+- Linux (**Terminal/Console proficiency**) (We're going to need to compile our stuff)
+- Gone through the all preceding parts of the tutorial
+- Some familiarity with threading will help
+
+
+
+## Table Of Contents <a name="top"></a>
+
+1. [Introduction](#1)    
+2. [C++ Threading Reference](#2)    
+   2.1 [Threads](#2.1)    
+   2.2 [Creating Threads](#2.2)    
+   2.3 [Thread Specific Functions](#2.3)    
+   2.4 [Sharing Data](#2.4)    
+   2.5 [Waiting, Killing, and Detaching](#2.5)    
+   2.6 [Race Conditions](#2.6)    
+   2.7 [Atomics](#2.7)    
+   2.8 [Mutex and Locks](#2.8)    
+   2.9 [A Better Way: Lock Guards](#2.9)    
+   2.10 [Lock Guard Types](#2.10)    
+   2.11 [Exclusive Locks vs Shared Locks](#2.11)    
+   2.12 [Mutex Types](#2.12)    
+   2.13 [Event Handling: Condition Variables](#2.13)    
+3. [C++ Concurrency Reference](#3)    
+   3.1 [Introduction](#3.1)    
+   3.2 [When to Use Threads or Tasks](#3.2)    
+   3.3 [Promises and Futures](#3.3)    
+   3.4 [A Simple Promise-Future Example](#3.4)    
+   3.5 [Async](#3.5)    
+   3.6 [Async Launch Policies](#3.6)    
+   3.7 [Different Ways to Call Async](#3.7)    
+
+
+
+
+## 1. Introduction <a name="1"></a>
+
+![_images/concurrency_vs_parallelism.png](assets/concurrency_vs_parallelism-1562918749730.png)
+
+[Image Source](<https://msl-network.readthedocs.io/en/latest/concurrency_async.html>)
+
+Everyone likes threading ja. Why not make such an efficient language such as C++ even more efficient with multi-threading.
+
+We're going to talk about the nice `std::thread` class that abstracts away the low level POSIX threads or p threads library in C. We'll also talk about `std::async` for asynchronous thread generation, as well as a bit on locks and atomic types.
+
+
+
+## 2. C++ Threading Reference <a name="2"></a>
+
+### 2.1 Threads <a name="2.1"></a>
+[go to top](#top)
+
+
+![img](assets/threads-as-control-flow.png)
+
+[Image Source](<https://kholdstare.github.io/technical/2012/08/21/objects-and-threads-in-cpp-1.html>)
+
+You can use the [std::thread](<http://www.cplusplus.com/reference/thread/thread/>) class to start threads. Each instance of this thread represents and wraps and manages a single execution thread.
+
+![_images/concurrency_vs_parallelism.png](assets/concurrency_vs_parallelism-1562918749730.png)
+
+[Image Source](<https://msl-network.readthedocs.io/en/latest/concurrency_async.html>)
+
+Threads will run **concurrently** if they're on the same processor. But ***in parallel*** if they're on different processors!
+
+Each thread has its own call stack, but **all threads share the heap.**
+
+You can find the maximum number of active threads that you can start. If your number of active threads exceeds this number you won't really get more performance out of it, so take note!
+
+```c++
+#include <thread>
+
+unsigned int c = std::thread::hardware_concurrency();
+```
+
+
+
+### 2.2 Creating Threads <a name="2.2"></a>
+[go to top](#top)
+
+
+There are several ways to create a thread:
+
+- Using a **function pointer**
+- Using a **lambda function**
+- Using a **functor**
+
+**Function Pointer**
+
+```c++
+#include <thread>
+
+// Define a function and start a thread that runs that function
+void rawr(params) {}
+std::thread rawr_thread(rawr, params);
+```
+**Lambda Function**
+```c++
+// Define a lambda expression and start a thread that runs that lambda expression
+auto rar = [](params) {};
+std::thread rar_thread(rar, params);
+
+// Or pass the lambda directly!
+std::thread rar_thread([](params) {};, params);
+```
+**Functor**
+```c++
+// Define a functor and start a thread that runs the functor's function call
+class raa_object_class {
+  void operator()(params) {}
+}
+
+std::thread raa_thread(raa_class_object(), params);
+```
+
+> Don't create threads on the heap with the new operator! Do it automatically on the stack for efficiency like in the examples stated above.
+
+
+
+### 2.3 Thread Specific Functions <a name="2.3"></a>
+[go to top](#top)
+
+
+Use `std::this_thread` within threads to refer to the current thread!
+
+**Note that yield() is NOT like the Python yield! It's completely different behaviour.**
+
+```c++
+#include <thread>
+#include <chrono>
+
+// These can be used within a thread
+
+// Get thread ID of thread
+std::this_thread::get_id();
+
+// Give priority to other threads, pause execution
+std::this_thread::yield();
+
+// Sleep for some amount of time
+std::this_thread::sleep_for(std::chrono::seconds(1));
+
+// Sleep until some time
+std::chrono::system_clock::time_point time_point = std::chrono::system_clock::now()
+                                                   + std::chrono::seconds(10);
+std::this_thread::sleep_until(time_point);
+```
+
+
+
+### 2.4 Sharing Data <a name="2.4"></a>
+[go to top](#top)
+
+
+**Global Variables**
+
+All global and static variables that are initialised at compile time can be accessed by threads. Since the threads should know the addresses for them.
+
+#### **Passing By Reference**
+
+All parameters passed to a function when starting a thread are **passed by value**, even if you defined in the function to pass by reference!
+
+You need to **explicitly wrap the arguments in std::ref() to pass by reference.**
+
+Example:
+
+```c++
+void ref_function(int &a, int b) {}
+
+int val;
+std::thread ref_function_thread(ref_function, std::ref(val), 2);
+```
+
+**Because the thread functions can't return anything, passing by reference is the only way to properly get data out of a thread without using global variables.** Ensure that your thread modifies the data passed in by reference and you should be good to go.
+
+#### **A Note on Static Variables**
+
+Be wary of declaring static variables in a multiple threads though!
+
+```c++
+// Suppose this is your thread function
+void method()
+{
+  static int var = 0;
+  var++;
+}
+```
+
+**Note that this does NOT create a separate instance of the static variable per thread instance.** This is because static variables are initialised once when the compiler goes over their declaration.
+
+If you want to have 'static' variables that are static within the scope of each particular thread, use `thread_local` variables instead. Then each thread will have its own version of the static variable, and the static variable will only be destroyed on thread exit.
+
+```c++
+void method()
+{
+  thread_local int var = 0;
+  var++;
+}
+```
+
+
+
+### 2.5 Waiting, Killing, and Detaching <a name="2.5"></a>
+[go to top](#top)
+
+
+#### **Waiting to Complete**
+
+You use the `join()` method to wait for a thread to complete.
+
+Calling `join()` will **block the main thread** until the thread that is being waited for completes.
+
+```c++
+// Start thread example_thread
+std::thread example_thread(some_function); 
+
+// Block and wait for thread to finish
+example_thread.join();
+
+// Ok! We're done and good to go on doing other stuff ...
+```
+
+**You cannot join a thread if it is not joinable** (maybe you killed it already, or it was detached.)
+
+```c++
+// So you can check if a thread is joinable before calling the join method!
+if (exmaple_thread.joinable())
+{
+  example_thread.join(); 
+}
+```
+
+#### **Kill a Thread**
+
+Use `return`, **not** `std::terminate()`! `terminate()` will kill your entire program process, not an individual thread. 
+
+```c++
+return;
+```
+
+#### **Detaching a Thread**
+
+You may `detach` a thread. That is, split it from the `std::thread()` object that manages it. Once you do that, you won't be able to manage the thread aside from any mutex or shared resources between the different threads.
+
+Those detached threads will only exit when the main process is terminated or when the top level function exits.
+
+```c++
+example_thread.detach();
+```
+
+
+
+### 2.6 Race Conditions <a name="2.6"></a>
+[go to top](#top)
+
+
+![SharedMutable](assets/SharedMutable.png)
+
+[Image Source](<https://www.modernescpp.com/index.php/c-core-guidelines-rules-for-concurrency-and-parallelism>)
+
+It's always thread-safe if you're only reading variables from multiple threads. But the moment you start writing data from multiple threads, you can potentially crash or create unexpected behaviour.
+
+**Example**
+
+```c++
+// Source: https://stackoverflow.com/questions/34510/what-is-a-race-condition
+
+if (x == 5) // The "Check"
+{
+   y = x * 2; // The "Act"
+
+   // If another thread changed x in between "if (x == 5)" and "y = x * 2" above,
+   // y will not be equal to 10.
+}
+```
+
+
+
+### 2.7 Atomics <a name="2.7"></a>
+[go to top](#top)
+
+
+So there are several ways to prevent race conditions. An `std::atomic` is just one way.
+
+An atomic type is mainly a type that implements atomic operations. That is, operations that are thread safe and run independently of any other processes. There can be some overhead, especially when there is a lot of contention around them, but it's hard to get into details for how much overhead exactly, since it's platform and context specific.
+
+Using an atomic type **guarantees no race conditions will occur.** 
+
+> **Use atomic types only when you need them, and native types when you don't. If you care about performance, that is.**
+
+You can check the [Atomic Types Reference](<https://en.cppreference.com/w/cpp/atomic/atomic>) for the full list of how to instantiate them, but here's a couple of examples.
+
+**There's a gigantic list! This table is non-exhaustive:**
+
+|    Type Alias    | Type Instantiation  |
+| :--------------: | :-----------------: |
+| std::atomic_bool | `std::atomic<bool>` |
+| std::atomic_char | `std::atomic<char>` |
+| std::atomic_int  | `std::atomic<int>`  |
+| std::atomic_long | `std::atomic<long>` |
+|        .         |          .          |
+|        .         |          .          |
+|        .         |          .          |
+
+
+
+### 2.8 Mutex and Locks <a name="2.8"></a>
+[go to top](#top)
+
+
+#### **Introduction**
+
+We'll go through this for completeness' sake, but there is a better way to do things (lock guards.)
+
+**Mutexes** are mutual exclusion objects that are used for thread synchronisation. They're a way to keep track of whether a particular thread is using a resource, and will cause threads to block if the resource is currently being taken. It's a way to **protect shared resources and to prevent race conditions.**
+
+They are **owned** by the thread that takes it. Hence, **mutual exclusion!**
+
+This will slow down your threaded program if threads wait too much, so use them sparingly! But you still need to use them to prevent race conditions and to really control the multi-threaded program flow of your program.
+
+They are the **interface** through which you can engage locks for your code!
+
+#### **Deadlocks**
+
+Of course, you need to be careful when you're using mutexes and locks. Overuse of locks will slow down your code, or in certain cases, cause deadlocks, causing your program to completely stall.
+
+![Image result for deadlock](assets/deadlock.png)
+
+[Image Source](<https://www.geeksforgeeks.org/operating-system-process-management-deadlock-introduction/>)
+
+> **Methods for handling deadlock**
+>
+> 1) **Deadlock prevention or avoidance**: The idea is to not let the system into deadlock state.
+> One can zoom into each category individually, Prevention is done by negating one of above mentioned necessary conditions for deadlock.
+>
+> 2) **Deadlock detection and recovery**: Let deadlock occur, then do preemption to handle it once occurred.
+>
+> 3) **Ignore the problem all together**: If deadlock is very rare, then let it happen and reboot the system. This is the approach that both Windows and UNIX take.
+>
+> <https://www.geeksforgeeks.org/operating-system-process-management-deadlock-introduction/>
+
+#### **Example Usage**
+
+> Note that this method is **not recommended**. It's actually an [**anti-pattern**](<http://kayari.org/cxx/antipatterns.html#locking-mutex>) but just included for completeness' sake.
+
+```c++
+#include <mutex>
+
+// Create your mutex here
+std::mutex my_mutex;
+
+// 
+thread_function()
+{
+  my_mutex.lock(); // Acquire lock
+  // Do some non-thread safe stuff...
+  my_mutex.unlock(); // Release lock
+}
+```
+
+
+
+### 2.9 A Better Way: Lock Guards <a name="2.9"></a>
+[go to top](#top)
+
+
+It's actually better to just use a lock guard, which manages the lifecycle of a mutex for you.
+
+It's kind of like the `with:` operator in Python.
+
+**Notably, a lock guard releases the lock automatically once the function that it is called in goes out of scope!**
+
+```c++
+#include <mutex>
+
+// Create your mutex here
+std::mutex my_mutex;
+ 
+thread_function()
+{
+  std::lock_guard<std::mutex> guard(my_mutex); // Acquire lock
+  // Do some non-thread safe stuff...
+}
+```
+
+
+
+### 2.10 Lock Guard Types <a name="2.10"></a>
+[go to top](#top)
+
+
+So there are actually several lock guard types.
+
+You've already seen the standard lock_guard
+
+#### **std::lock_guard<>**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/lock_guard>)
+
+- Simplest lock guard
+- Takes a mutex on construction
+- Releases the mutex once it goes out of scope
+
+```c++
+std::lock_guard<std::mutex> guard(my_mutex);
+```
+
+#### **std::scoped_lock<>**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/scoped_lock>)
+
+This was introduced in C++17, and is the standard lock guard to use, over `std::lock_guard<>`, which is included for compatibility.
+
+- It's just a lock guard
+- Except it can take **multiple mutexes**
+
+```c++
+std::scoped_lock<std::mutex, std::mutex> guard(mutex_1, mutex_2);
+```
+
+#### **std::unique_lock<>**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/unique_lock>)
+
+- Just like the normal lock guard, except...
+- It initialises an exclusive lock
+- It can be returned from the function without releasing the lock (via move semantics)
+- It can be released before it is destroyed
+- You can also use **nifty lock methods!**
+
+```c++
+std::unique_lock<std::mutex> guard(my_mutex);
+
+// Check if guard owns lock (either works)
+guard.owns_lock();
+bool(guard);
+
+// Return function without releasing the lock
+return std::move(guard);
+
+// Release lock before destruction
+guard.unlock();
+```
+
+If you defer the locks, you can use the **nifty lock methods!**
+
+```c++
+// Initialise the lock guard, but don't actually lock yet
+std::unique_lock<std::mutex> guard(mutex_1, std::defer_lock);
+
+// Now you can do some of the following!
+guard.lock(); // Lock now!
+guard.try_lock(); // Won't block if it can't acquire
+guard.try_lock_for(); // Only for timed_mutexes
+guard.try_lock_until(); // Only for timed_mutexes
+```
+
+#### **std::shared_lock<>**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/shared_lock>)
+
+A shared lock is just like a unique lock, except the lock is a shared lock as opposed to an exclusive one.
+
+- Just like the normal lock guard, except...
+- It initialises a shared lock
+- It can be returned from the function without releasing the lock (via move semantics)
+- It can be released before it is destroyed
+- You can also use **nifty lock methods!**
+
+```c++
+std::shared_lock my_mutex;
+std::shared_lock<std::shared_mutex> guard(my_mutex);
+
+// Check if guard owns lock (either works)
+guard.owns_lock();
+bool(guard);
+
+// Return function without releasing the lock
+return std::move(guard);
+
+// Release lock before destruction
+guard.unlock();
+```
+
+If you defer the locks, you can use the **nifty lock methods!**
+
+```c++
+// Initialise the lock guard, but don't actually lock yet
+std::shared_lock<std::shared_mutex> guard(mutex_1, std::defer_lock);
+
+// Now you can do some of the following!
+guard.lock(); // Lock now!
+guard.try_lock(); // Won't block if it can't acquire
+guard.try_lock_for(); // Only for timed_mutexes
+guard.try_lock_until(); // Only for timed_mutexes
+```
+
+
+
+### 2.11 Exclusive Locks vs Shared Locks <a name="2.11"></a>
+[go to top](#top)
+
+
+**Exclusive locks** (aka write locks) **inhibit all access** from other threads until the lock is released.
+
+**Shared locks** (aka read locks) **inhibit all writes** from other threads until the lock is released. Other threads have to request the lock to be granted the permission to read though.
+
+> Exclusive lock mode prevents the associated resource from being shared. This lock mode is obtained to modify data. The first transaction to lock a resource exclusively is the only transaction that can alter the resource until the exclusive lock is released.
+>
+> Share lock mode allows the associated resource to be shared, depending on the operations involved. Multiple users reading data can share the data, holding share locks to prevent concurrent access by a writer (who needs an exclusive lock). Several transactions can acquire share locks on the same resource.
+>
+> ---
+>
+> Think of a lockable object as a *blackboard* (lockable) in a class room containing a *teacher* (writer) and many *students* (readers).
+>
+> While a teacher is writing something (exclusive lock) on the board:
+>
+> 1. Nobody can read it, because it's still being written, and she's blocking your view => ***If an object is exclusively locked, shared locks cannot be obtained*.**
+> 2. Other teachers won't come up and start writing either, or the board becomes unreadable, and confuses students => ***If an object is exclusively locked, other exclusive locks cannot be obtained*.**
+>
+> When the students are reading (shared locks) what is on the board:
+>
+> 1. They all can read what is on it, together => *Multiple shared locks can co-exist*.
+> 2. The teacher waits for them to finish reading before she clears the board to write more => *If one or more shared locks already exist, exclusive locks cannot be obtained*.
+>
+> <https://stackoverflow.com/questions/11837428/whats-the-difference-between-an-exclusive-lock-and-a-shared-lock>
+
+Notice this means that **if an object is shared locked, you can acquire shared locks, but not exclusive locks.**
+
+Basically: 
+
+- If there are multiple readers, no writers can bind, but readers can bind.
+- If there is one writer, no one can bind.
+
+
+
+
+### 2.12 Mutex Types <a name="2.12"></a>
+[go to top](#top)
+
+
+There are [several](<https://en.cppreference.com/w/cpp/thread/mutex>) [types](<https://en.cppreference.com/w/cpp/thread/recursive_mutex>) [of](<https://en.cppreference.com/w/cpp/thread/timed_mutex>) [mutex](<https://en.cppreference.com/w/cpp/thread/recursive_timed_mutex>).
+
+#### **std::mutex**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/mutex>)
+
+- Just your plain lockable mutex
+
+#### **std::timed_mutex**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/timed_mutex>)
+
+- Timed mutex
+- You can lock for a specified amount of time with `try_lock_for()` and `try_lock_until()`
+
+#### **std::recursive_mutex**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/recursive_mutex>)
+
+- Multiple locks can be acquired by the same thread
+- You need to call unlock the same amount of times you've called lock before the lock is released
+
+#### **std::recursive_timed_mutex**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/recursive_timed_mutex>)
+
+- Same as the recursive mutex, except it also has the timed locking methods that timed mutexes have
+
+#### **std::shared_timed_mutex**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/shared_timed_mutex>)
+
+- Read-Write mutex
+- Can acquire both exclusive or shared locks (just use the appropriate lock guard type!)
+
+```c++
+std::unique_lock<std::shared_timed_mutex> writer_guard(writing_mutex, std::defer_lock);
+std::shared_lock<std::shared_timed_mutex> reader_guard(reading_mutex, std::defer_lock);
+
+// Lock them!
+std::lock(writer_guard, reader_guard);
+```
+
+
+
+### 2.13 Event Handling: Condition Variables <a name="2.13"></a>
+[go to top](#top)
+
+
+Sometimes you need to do some nice signal/event handling.
+
+It's possible to do it using a global variable that you constantly lock threads for to check, but it's far more efficient to use **[condition variables](<https://en.cppreference.com/w/cpp/thread/condition_variable>)**.
+
+A condition variable allows you to **wait for some condition to be true** before continuing thread execution. During this time, any locks that were passed to the waiting function are released until the condition is fulfilled. Following which, the lock is reacquired.
+
+> **Example Flow**
+>
+> 1. Thread **acquires lock**
+> 2. Check if condition is false
+> 3. If false, call `wait()`, which **releases the lock and blocks the thread until the condition is fulfilled**
+> 4. If a condition is fulfilled, the condition variable **must be notified** before it can check
+> 5. Once the condition check succeeds, **thread reacquires lock and continues execution**
+
+Let's try it out!
+
+Condition variables use unique_locks, so we'll use that.
+
+#### **Basic Example**
+
+```c++
+#include <condition_variable>
+
+// Init
+std::condition_variable condition_var;
+std::mutex mutex;
+bool condition(false);
+
+// Acquire lock
+std::unique_lock<std::mutex> guard(mutex);
+
+// Avoid spurious wakeups and 
+// ensure wait is only called when the condition has not been fulfilled
+while (!condition)
+{
+  condition_var.wait(guard);
+}
+
+// Now in some other thread
+{
+  // Acquire lock
+  std::unique_lock<std::mutex> guard(mutex);
+
+  // We can set the condition to true
+  condition = true;
+
+  // And notify one blocked thread by the condition variable that it's ok to wake up
+  // (In this case we only have one)
+  condition_var.notify_one();
+
+  // If we want to notify all of them instead...
+  condition_var.notify_all();
+    
+  // If we didn't surround the threads with the while (!condition) loop,
+  // Notifying the threads will cause the wait to return. So there's no condition check.
+  // But this is dangerous since random wakeups can occur without notifications!
+}
+```
+
+**You may also choose to make the condition be an atomic boolean instead so you can save on lock acquisition for any thread that sets the condition.**
+
+Like so: `std::atomic<bool> condition(true);`
+
+#### **Additional Methods**
+
+```c++
+// Wait for some time or until some time is reached
+condition_var.wait_for();
+condition_var.wait_until();
+
+// There's also a nice function to cleanup any condition variables by a lock acquiring thread
+// It's an equivalent call to 
+// First: destroying all objects that are meant to destroy on thread exit
+// Then: mutex.unlock(); condition_var.notify_all();
+std::notify_all_at_thread_exit(condition_var, some_unique_lock);
+```
+
+#### **Spurious Wakeups**
+
+A bit tricky. But sometimes condition variables can wakeup on their own due to some [threading technomagic](<https://stackoverflow.com/questions/8594591/why-does-pthread-cond-wait-have-spurious-wakeups>).
+
+It's relatively trivial to guard against it, and it's another layer of protection against human error, so it makes sense to at least try to deal with them explicitly.
+
+```c++
+// You guard against spurious wakeups by surrounding the condition variable
+// with a check for the condition (you're checking the predicate)
+while (!condition)
+{
+  condition_var.wait(guard);
+}
+
+// Alternatively, you can do it this way as well,
+// which is neater but slightly less intuitive
+condition_var.wait(guard, condition_function);
+
+// If we want to just check a bool called condition we need to use lambdas
+condition_var.wait(guard, [](){return condition == true;});
+```
+
+
+
+## 3. C++ Concurrency Reference <a name="3"></a>
+
+### 3.1 Introduction <a name="3.1"></a>
+[go to top](#top)
+
+
+We just went through manual thread handling in the previous section.
+
+But if you're lazy, or you don't need the tight control the thread, mutex, and lock guard classes offer you, you may choose to adopt **task based parallelism** instead, as opposed to **thread based parallelism**. It's generally considered faster to work with tasks as opposed to threads, especially since the chance of tasks messing up is far lower than that of threads.
+
+With the `std::async` library, manual thread handling is **abstracted away**, and you rely on the library's system to possibly spawn threads, depending on available resources. **The main benefit of this form of parallelism is the great ease in getting returned values from tasks that you start.**
+
+Before, when using threads, you'd have to pass variables via reference and have threads modify the variable. But now with tasks, you can just directly return the result of the task!
+
+So instead of thinking of starting the threads yourself, you can only be concerned with starting **tasks** that will return when they are supposed to. If tasks haven't returned yet, the code will block until it does.
+
+
+
+### 3.2 When to Use Threads or Tasks <a name="3.2"></a>
+[go to top](#top)
+
+
+Use **threads** if:
+
+- You need tight control over mutexes
+- Need to run long-lived, complex tasks
+
+Use **tasks** if:
+
+- You want fairly simple code and don't care for managing threads
+- Are running short tasks
+
+
+
+### 3.3 Promises and Futures <a name="3.3"></a>
+[go to top](#top)
+
+
+![1562934941151](assets/1562934941151.png)
+
+[Image Source](<https://modoocode.com/284>)
+
+![1562935061068](assets/1562935061068.png)
+
+[Image Source](<https://www.slideshare.net/cppfrug/async-await-in-c>)
+
+#### **Header**
+
+```c++
+#include <future>
+```
+
+#### **Futures**
+
+A [std::future](<https://en.cppreference.com/w/cpp/thread/future>) is a class template that stores a value that will be assigned in the future, and provides a way to access that value (with `get()`). If its value is accessed before the value is assigned, it will block until the value resolves.
+
+Futures are the objects that are **returned** by asynchronous operations (from `std::async`, `std::packaged_task`, or `std::promise`).
+
+**Shared Futures**
+
+A [std::shared_future](<https://en.cppreference.com/w/cpp/thread/shared_future>) works the same way, except it is copyable. Which means that multiple threads are allowed to wait for the same shared state.
+
+#### **Promises**
+
+A [std::promise](<https://en.cppreference.com/w/cpp/thread/promise>) provides a facility to store a value that is later acquired asynchronously via the future **that the promise creates**.
+
+Every promise **is associated with a future**! And a promise **sets** the value of that future. Other objects can then access the future for the value that the promise stores.
+
+#### **A dumb analogy**
+
+> **Today is a Gift. That is why it is called Present.**
+>
+> You're a parent trying to get a gift for your child.
+>
+> You give your kid a box, and **promise** them that the gift is inside. The gift is the **future** you are promising. But you tell them to only to check in the future.
+>
+> If your kid tries to check, you panic, take the box away and, **block** them from checking, until you **fulfill your promise and fill the box** with the gift, then you can give it back and your kid can continue his day having gotten their gift.
+
+#### **A slightly better analogy**
+
+> **Food Analogy**
+>
+> Let's say you're an office worker. You make an order for lunch from a store across the street via your phone app.
+>
+> The store owner receives your order, and by the powers of the social contract, makes a **promise** to fulfill your order. He issues you a receipt that is associated with this **promise**, guaranteeing you that you will be able to collect your order in the **future** if he ever fulfills his promise.
+>
+> You **block** off some time, stop your work at the office, and head down to the store.
+>
+> But OH NO! The store owner hasn't fulfilled your order yet. And as long as you're waiting to **get()** your order, you can't do any work. Some might even say your **waiting to get your order in the future is blocking your ability to work.**
+>
+> Once the store owner **sets()** your order down, and lets you **get()** it from his counter though, you're able to **stop getting blocked** and go back to the office to work.
+
+![mindblow](assets/mindblow.gif)
+
+
+
+### 3.4 A Simple Promise-Future Example <a name="3.4"></a>
+[go to top](#top)
+
+
+![std::promise and std::future](assets/promise.png)
+
+[Image Source](<https://thispointer.com//c11-multithreading-part-8-stdfuture-stdpromise-and-returning-values-from-thread/>)
+
+**Note:** If your promise object is destroyed before you set its value, the `get()` method for its associated future will throw an exception.
+
+**Also note:** Each future's `get()` method can only be called once. If you want a future that can be accessed multiple times, use a shared_future instead. Otherwise, **initialise a different promise future pair.**
+
+```c++
+// Create a promise
+std::promise<int> promise;
+
+// And get its future
+std::future<int> future = promise.get_future();
+
+// You can also get a shared future this way, by the way! (Choose one please)
+std::shared_future<int> shared_future = promise.get_future();
+
+// Now suppose we passed promise to a separate thread.
+// And in the main thread we call...
+int val = future.get(); // This will block!
+
+// Until, that is, we set the future's value via the promise
+promise.set_value(10); // In the separate thread
+
+// So now in the main thread, if we try to access val...
+std::cout << val << std::endl;
+
+// Output: 10
+```
+
+Or, more completely
+
+```c++
+// Source: https://thispointer.com//c11-multithreading-part-8-stdfuture-stdpromise-and-returning-values-from-thread/
+
+#include <iostream>
+#include <thread>
+#include <future>
+ 
+void initiazer(std::promise<int> * promObj)
+{
+    std::cout<<"Inside Thread"<<std::endl;     promObj->set_value(35);
+}
+ 
+int main()
+{
+    std::promise<int> promiseObj;
+    std::future<int> futureObj = promiseObj.get_future();
+    std::thread th(initiazer, &promiseObj);
+    std::cout<<futureObj.get()<<std::endl;
+    th.join();
+    return 0;
+}
+```
+
+
+
+### 3.5 Async <a name="3.5"></a>
+[go to top](#top)
+
+
+[std::async](<https://en.cppreference.com/w/cpp/thread/async>)
+
+Now that we've talked about futures and promises we can finally actually get to the real asynchronous coding library.
+
+Async is a function template allows you to spawn threads to do work, then collect the results from them via the **future** mechanism. In fact, calls to `std::async` return a `std::future` object!
+
+**Do note that async does support parallelism, just that the default constructor manages threads for you and may possibly not run the passed functions in a thread. You'll have to explicitly tell it to run the function in a new thread.**
+
+Also, since Linux threads run sequentially by default, it's especially important to force the functions to run in separate threads. We'll see how to do that later.
+
+The simplest call to async is to just pass in a callback function as an argument, and let the system handle it for you.
+
+```c++
+auto future = std::async(some_function, arg_1, arg_2);
+```
+
+
+
+### 3.6 Async Launch Policies <a name="3.6"></a>
+[go to top](#top)
+
+
+You can do better though!
+
+There are three ways to launch an async task:
+
+- `std::launch::async` : Guarantees launch in a separate thread
+- `std::launch::deferred`: Function will only be called on `get()`
+- `std::launch::async | std::launch::deferred`: Default behaviour. Defer to system.
+
+I like to run async tasks with the `std::launch::async` profile so I can have some semblance of control over the threads. Just **add it in as the first argument!**
+
+```c++
+auto future = std::async(std::launch::async, some_function, arg_1, arg_2);
+```
+
+
+
+### 3.7 Different Ways to Call Async <a name="3.7"></a>
+[go to top](#top)
+
+
+```c++
+// Pass in function pointer
+auto future = std::async(std::launch::async, some_function, arg_1, arg_2);
+
+// Pass in function reference
+auto future = std::async(std::launch::async, &some_function, arg_1, arg_2);
+
+// Pass in function object
+struct SomeFunctionObject
+{
+	void operator() (int arg_1){}
+};
+auto future = std::async(std::launch::async, SomeFunctionObject(), arg_1);
+
+// Lambda function
+auto future = std::async(std::launch::async, [](){});
+```
+
+
+
+
+```
+                            .     .
+                         .  |\-^-/|  .    
+                        /| } O.=.O { |\     
+```
+
+---
+
+ [![Yeah! Buy the DRAGON a COFFEE!](../_assets/COFFEE%20BUTTON%20%E3%83%BE(%C2%B0%E2%88%87%C2%B0%5E).png)](https://www.buymeacoffee.com/methylDragon)
+
+ 
+
+Author: methylDragon  
+Contains an advanced syntax reference for C++  
+This time, we'll be going through C++ multithreading and concurrency related stuff!    
+
+------
+
+## Pre-Requisites
+
+**Assumed knowledge (This is a C++ crash course, not a basic coding tutorial)**
+
+- How **variables, loops, conditionals, etc**. work (Basic coding fundamentals will help a lot!)
+- Linux (**Terminal/Console proficiency**) (We're going to need to compile our stuff)
+- Gone through the all preceding parts of the tutorial
+- Some familiarity with threading will help
+
+
+
+## Table Of Contents <a name="top"></a>
+
+1. [Introduction](#1)    
+2. [C++ Threading Reference](#2)    
+   2.1 [Threads](#2.1)    
+   2.2 [Creating Threads](#2.2)    
+   2.3 [Thread Specific Functions](#2.3)    
+   2.4 [Sharing Data](#2.4)    
+   2.5 [Waiting, Killing, and Detaching](#2.5)    
+   2.6 [Race Conditions](#2.6)    
+   2.7 [Atomics](#2.7)    
+   2.8 [Mutex and Locks](#2.8)    
+   2.9 [A Better Way: Lock Guards](#2.9)    
+   2.10 [Lock Guard Types](#2.10)    
+   2.11 [Exclusive Locks vs Shared Locks](#2.11)    
+   2.12 [Mutex Types](#2.12)    
+   2.13 [Event Handling: Condition Variables](#2.13)    
+3. [C++ Concurrency Reference](#3)    
+   3.1 [Introduction](#3.1)    
+   3.2 [When to Use Threads or Tasks](#3.2)    
+   3.3 [Promises and Futures](#3.3)    
+   3.4 [A Simple Promise-Future Example](#3.4)    
+   3.5 [Async](#3.5)    
+   3.6 [Async Launch Policies](#3.6)    
+   3.7 [Different Ways to Call Async](#3.7)    
+
+
+
+
+## 1. Introduction <a name="1"></a>
+
+![_images/concurrency_vs_parallelism.png](assets/concurrency_vs_parallelism-1562918749730.png)
+
+[Image Source](<https://msl-network.readthedocs.io/en/latest/concurrency_async.html>)
+
+Everyone likes threading ja. Why not make such an efficient language such as C++ even more efficient with multi-threading.
+
+We're going to talk about the nice `std::thread` class that abstracts away the low level POSIX threads or p threads library in C. We'll also talk about `std::async` for asynchronous thread generation, as well as a bit on locks and atomic types.
+
+
+
+## 2. C++ Threading Reference <a name="2"></a>
+
+### 2.1 Threads <a name="2.1"></a>
+[go to top](#top)
+
+
+![img](assets/threads-as-control-flow.png)
+
+[Image Source](<https://kholdstare.github.io/technical/2012/08/21/objects-and-threads-in-cpp-1.html>)
+
+You can use the [std::thread](<http://www.cplusplus.com/reference/thread/thread/>) class to start threads. Each instance of this thread represents and wraps and manages a single execution thread.
+
+![_images/concurrency_vs_parallelism.png](assets/concurrency_vs_parallelism-1562918749730.png)
+
+[Image Source](<https://msl-network.readthedocs.io/en/latest/concurrency_async.html>)
+
+Threads will run **concurrently** if they're on the same processor. But ***in parallel*** if they're on different processors!
+
+Each thread has its own call stack, but **all threads share the heap.**
+
+You can find the maximum number of active threads that you can start. If your number of active threads exceeds this number you won't really get more performance out of it, so take note!
+
+```c++
+#include <thread>
+
+unsigned int c = std::thread::hardware_concurrency();
+```
+
+
+
+### 2.2 Creating Threads <a name="2.2"></a>
+[go to top](#top)
+
+
+There are several ways to create a thread:
+
+- Using a **function pointer**
+- Using a **lambda function**
+- Using a **functor**
+
+**Function Pointer**
+
+```c++
+#include <thread>
+
+// Define a function and start a thread that runs that function
+void rawr(params) {}
+std::thread rawr_thread(rawr, params);
+```
+**Lambda Function**
+```c++
+// Define a lambda expression and start a thread that runs that lambda expression
+auto rar = [](params) {};
+std::thread rar_thread(rar, params);
+
+// Or pass the lambda directly!
+std::thread rar_thread([](params) {};, params);
+```
+**Functor**
+```c++
+// Define a functor and start a thread that runs the functor's function call
+class raa_object_class {
+  void operator()(params) {}
+}
+
+std::thread raa_thread(raa_class_object(), params);
+```
+
+> Don't create threads on the heap with the new operator! Do it automatically on the stack for efficiency like in the examples stated above.
+
+
+
+### 2.3 Thread Specific Functions <a name="2.3"></a>
+[go to top](#top)
+
+
+Use `std::this_thread` within threads to refer to the current thread!
+
+**Note that yield() is NOT like the Python yield! It's completely different behaviour.**
+
+```c++
+#include <thread>
+#include <chrono>
+
+// These can be used within a thread
+
+// Get thread ID of thread
+std::this_thread::get_id();
+
+// Give priority to other threads, pause execution
+std::this_thread::yield();
+
+// Sleep for some amount of time
+std::this_thread::sleep_for(std::chrono::seconds(1));
+
+// Sleep until some time
+std::chrono::system_clock::time_point time_point = std::chrono::system_clock::now()
+                                                   + std::chrono::seconds(10);
+std::this_thread::sleep_until(time_point);
+```
+
+
+
+### 2.4 Sharing Data <a name="2.4"></a>
+[go to top](#top)
+
+
+**Global Variables**
+
+All global and static variables that are initialised at compile time can be accessed by threads. Since the threads should know the addresses for them.
+
+#### **Passing By Reference**
+
+All parameters passed to a function when starting a thread are **passed by value**, even if you defined in the function to pass by reference!
+
+You need to **explicitly wrap the arguments in std::ref() to pass by reference.**
+
+Example:
+
+```c++
+void ref_function(int &a, int b) {}
+
+int val;
+std::thread ref_function_thread(ref_function, std::ref(val), 2);
+```
+
+**Because the thread functions can't return anything, passing by reference is the only way to properly get data out of a thread without using global variables.** Ensure that your thread modifies the data passed in by reference and you should be good to go.
+
+#### **A Note on Static Variables**
+
+Be wary of declaring static variables in a multiple threads though!
+
+```c++
+// Suppose this is your thread function
+void method()
+{
+  static int var = 0;
+  var++;
+}
+```
+
+**Note that this does NOT create a separate instance of the static variable per thread instance.** This is because static variables are initialised once when the compiler goes over their declaration.
+
+If you want to have 'static' variables that are static within the scope of each particular thread, use `thread_local` variables instead. Then each thread will have its own version of the static variable, and the static variable will only be destroyed on thread exit.
+
+```c++
+void method()
+{
+  thread_local int var = 0;
+  var++;
+}
+```
+
+
+
+### 2.5 Waiting, Killing, and Detaching <a name="2.5"></a>
+[go to top](#top)
+
+
+#### **Waiting to Complete**
+
+You use the `join()` method to wait for a thread to complete.
+
+Calling `join()` will **block the main thread** until the thread that is being waited for completes.
+
+```c++
+// Start thread example_thread
+std::thread example_thread(some_function); 
+
+// Block and wait for thread to finish
+example_thread.join();
+
+// Ok! We're done and good to go on doing other stuff ...
+```
+
+**You cannot join a thread if it is not joinable** (maybe you killed it already, or it was detached.)
+
+```c++
+// So you can check if a thread is joinable before calling the join method!
+if (exmaple_thread.joinable())
+{
+  example_thread.join(); 
+}
+```
+
+#### **Kill a Thread**
+
+Use `return`, **not** `std::terminate()`! `terminate()` will kill your entire program process, not an individual thread. 
+
+```c++
+return;
+```
+
+#### **Detaching a Thread**
+
+You may `detach` a thread. That is, split it from the `std::thread()` object that manages it. Once you do that, you won't be able to manage the thread aside from any mutex or shared resources between the different threads.
+
+Those detached threads will only exit when the main process is terminated or when the top level function exits.
+
+```c++
+example_thread.detach();
+```
+
+
+
+### 2.6 Race Conditions <a name="2.6"></a>
+[go to top](#top)
+
+
+![SharedMutable](assets/SharedMutable.png)
+
+[Image Source](<https://www.modernescpp.com/index.php/c-core-guidelines-rules-for-concurrency-and-parallelism>)
+
+It's always thread-safe if you're only reading variables from multiple threads. But the moment you start writing data from multiple threads, you can potentially crash or create unexpected behaviour.
+
+**Example**
+
+```c++
+// Source: https://stackoverflow.com/questions/34510/what-is-a-race-condition
+
+if (x == 5) // The "Check"
+{
+   y = x * 2; // The "Act"
+
+   // If another thread changed x in between "if (x == 5)" and "y = x * 2" above,
+   // y will not be equal to 10.
+}
+```
+
+
+
+### 2.7 Atomics <a name="2.7"></a>
+[go to top](#top)
+
+
+So there are several ways to prevent race conditions. An `std::atomic` is just one way.
+
+An atomic type is mainly a type that implements atomic operations. That is, operations that are thread safe and run independently of any other processes. There can be some overhead, especially when there is a lot of contention around them, but it's hard to get into details for how much overhead exactly, since it's platform and context specific.
+
+Using an atomic type **guarantees no race conditions will occur.** 
+
+> **Use atomic types only when you need them, and native types when you don't. If you care about performance, that is.**
+
+You can check the [Atomic Types Reference](<https://en.cppreference.com/w/cpp/atomic/atomic>) for the full list of how to instantiate them, but here's a couple of examples.
+
+**There's a gigantic list! This table is non-exhaustive:**
+
+|    Type Alias    | Type Instantiation  |
+| :--------------: | :-----------------: |
+| std::atomic_bool | `std::atomic<bool>` |
+| std::atomic_char | `std::atomic<char>` |
+| std::atomic_int  | `std::atomic<int>`  |
+| std::atomic_long | `std::atomic<long>` |
+|        .         |          .          |
+|        .         |          .          |
+|        .         |          .          |
+
+
+
+### 2.8 Mutex and Locks <a name="2.8"></a>
+[go to top](#top)
+
+
+#### **Introduction**
+
+We'll go through this for completeness' sake, but there is a better way to do things (lock guards.)
+
+**Mutexes** are mutual exclusion objects that are used for thread synchronisation. They're a way to keep track of whether a particular thread is using a resource, and will cause threads to block if the resource is currently being taken. It's a way to **protect shared resources and to prevent race conditions.**
+
+They are **owned** by the thread that takes it. Hence, **mutual exclusion!**
+
+This will slow down your threaded program if threads wait too much, so use them sparingly! But you still need to use them to prevent race conditions and to really control the multi-threaded program flow of your program.
+
+They are the **interface** through which you can engage locks for your code!
+
+#### **Deadlocks**
+
+Of course, you need to be careful when you're using mutexes and locks. Overuse of locks will slow down your code, or in certain cases, cause deadlocks, causing your program to completely stall.
+
+![Image result for deadlock](assets/deadlock.png)
+
+[Image Source](<https://www.geeksforgeeks.org/operating-system-process-management-deadlock-introduction/>)
+
+> **Methods for handling deadlock**
+>
+> 1) **Deadlock prevention or avoidance**: The idea is to not let the system into deadlock state.
+> One can zoom into each category individually, Prevention is done by negating one of above mentioned necessary conditions for deadlock.
+>
+> 2) **Deadlock detection and recovery**: Let deadlock occur, then do preemption to handle it once occurred.
+>
+> 3) **Ignore the problem all together**: If deadlock is very rare, then let it happen and reboot the system. This is the approach that both Windows and UNIX take.
+>
+> <https://www.geeksforgeeks.org/operating-system-process-management-deadlock-introduction/>
+
+#### **Example Usage**
+
+> Note that this method is **not recommended**. It's actually an [**anti-pattern**](<http://kayari.org/cxx/antipatterns.html#locking-mutex>) but just included for completeness' sake.
+
+```c++
+#include <mutex>
+
+// Create your mutex here
+std::mutex my_mutex;
+
+// 
+thread_function()
+{
+  my_mutex.lock(); // Acquire lock
+  // Do some non-thread safe stuff...
+  my_mutex.unlock(); // Release lock
+}
+```
+
+
+
+### 2.9 A Better Way: Lock Guards <a name="2.9"></a>
+[go to top](#top)
+
+
+It's actually better to just use a lock guard, which manages the lifecycle of a mutex for you.
+
+It's kind of like the `with:` operator in Python.
+
+**Notably, a lock guard releases the lock automatically once the function that it is called in goes out of scope!**
+
+```c++
+#include <mutex>
+
+// Create your mutex here
+std::mutex my_mutex;
+ 
+thread_function()
+{
+  std::lock_guard<std::mutex> guard(my_mutex); // Acquire lock
+  // Do some non-thread safe stuff...
+}
+```
+
+
+
+### 2.10 Lock Guard Types <a name="2.10"></a>
+[go to top](#top)
+
+
+So there are actually several lock guard types.
+
+You've already seen the standard lock_guard
+
+#### **std::lock_guard<>**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/lock_guard>)
+
+- Simplest lock guard
+- Takes a mutex on construction
+- Releases the mutex once it goes out of scope
+
+```c++
+std::lock_guard<std::mutex> guard(my_mutex);
+```
+
+#### **std::scoped_lock<>**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/scoped_lock>)
+
+This was introduced in C++17, and is the standard lock guard to use, over `std::lock_guard<>`, which is included for compatibility.
+
+- It's just a lock guard
+- Except it can take **multiple mutexes**
+
+```c++
+std::scoped_lock<std::mutex, std::mutex> guard(mutex_1, mutex_2);
+```
+
+#### **std::unique_lock<>**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/unique_lock>)
+
+- Just like the normal lock guard, except...
+- It initialises an exclusive lock
+- It can be returned from the function without releasing the lock (via move semantics)
+- It can be released before it is destroyed
+- You can also use **nifty lock methods!**
+
+```c++
+std::unique_lock<std::mutex> guard(my_mutex);
+
+// Check if guard owns lock (either works)
+guard.owns_lock();
+bool(guard);
+
+// Return function without releasing the lock
+return std::move(guard);
+
+// Release lock before destruction
+guard.unlock();
+```
+
+If you defer the locks, you can use the **nifty lock methods!**
+
+```c++
+// Initialise the lock guard, but don't actually lock yet
+std::unique_lock<std::mutex> guard(mutex_1, std::defer_lock);
+
+// Now you can do some of the following!
+guard.lock(); // Lock now!
+guard.try_lock(); // Won't block if it can't acquire
+guard.try_lock_for(); // Only for timed_mutexes
+guard.try_lock_until(); // Only for timed_mutexes
+```
+
+#### **std::shared_lock<>**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/shared_lock>)
+
+A shared lock is just like a unique lock, except the lock is a shared lock as opposed to an exclusive one.
+
+- Just like the normal lock guard, except...
+- It initialises a shared lock
+- It can be returned from the function without releasing the lock (via move semantics)
+- It can be released before it is destroyed
+- You can also use **nifty lock methods!**
+
+```c++
+std::shared_lock my_mutex;
+std::shared_lock<std::shared_mutex> guard(my_mutex);
+
+// Check if guard owns lock (either works)
+guard.owns_lock();
+bool(guard);
+
+// Return function without releasing the lock
+return std::move(guard);
+
+// Release lock before destruction
+guard.unlock();
+```
+
+If you defer the locks, you can use the **nifty lock methods!**
+
+```c++
+// Initialise the lock guard, but don't actually lock yet
+std::shared_lock<std::shared_mutex> guard(mutex_1, std::defer_lock);
+
+// Now you can do some of the following!
+guard.lock(); // Lock now!
+guard.try_lock(); // Won't block if it can't acquire
+guard.try_lock_for(); // Only for timed_mutexes
+guard.try_lock_until(); // Only for timed_mutexes
+```
+
+
+
+### 2.11 Exclusive Locks vs Shared Locks <a name="2.11"></a>
+[go to top](#top)
+
+
+**Exclusive locks** (aka write locks) **inhibit all access** from other threads until the lock is released.
+
+**Shared locks** (aka read locks) **inhibit all writes** from other threads until the lock is released. Other threads have to request the lock to be granted the permission to read though.
+
+> Exclusive lock mode prevents the associated resource from being shared. This lock mode is obtained to modify data. The first transaction to lock a resource exclusively is the only transaction that can alter the resource until the exclusive lock is released.
+>
+> Share lock mode allows the associated resource to be shared, depending on the operations involved. Multiple users reading data can share the data, holding share locks to prevent concurrent access by a writer (who needs an exclusive lock). Several transactions can acquire share locks on the same resource.
+>
+> ---
+>
+> Think of a lockable object as a *blackboard* (lockable) in a class room containing a *teacher* (writer) and many *students* (readers).
+>
+> While a teacher is writing something (exclusive lock) on the board:
+>
+> 1. Nobody can read it, because it's still being written, and she's blocking your view => ***If an object is exclusively locked, shared locks cannot be obtained*.**
+> 2. Other teachers won't come up and start writing either, or the board becomes unreadable, and confuses students => ***If an object is exclusively locked, other exclusive locks cannot be obtained*.**
+>
+> When the students are reading (shared locks) what is on the board:
+>
+> 1. They all can read what is on it, together => *Multiple shared locks can co-exist*.
+> 2. The teacher waits for them to finish reading before she clears the board to write more => *If one or more shared locks already exist, exclusive locks cannot be obtained*.
+>
+> <https://stackoverflow.com/questions/11837428/whats-the-difference-between-an-exclusive-lock-and-a-shared-lock>
+
+Notice this means that **if an object is shared locked, you can acquire shared locks, but not exclusive locks.**
+
+Basically: 
+
+- If there are multiple readers, no writers can bind, but readers can bind.
+- If there is one writer, no one can bind.
+
+
+
+
+### 2.12 Mutex Types <a name="2.12"></a>
+[go to top](#top)
+
+
+There are [several](<https://en.cppreference.com/w/cpp/thread/mutex>) [types](<https://en.cppreference.com/w/cpp/thread/recursive_mutex>) [of](<https://en.cppreference.com/w/cpp/thread/timed_mutex>) [mutex](<https://en.cppreference.com/w/cpp/thread/recursive_timed_mutex>).
+
+#### **std::mutex**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/mutex>)
+
+- Just your plain lockable mutex
+
+#### **std::timed_mutex**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/timed_mutex>)
+
+- Timed mutex
+- You can lock for a specified amount of time with `try_lock_for()` and `try_lock_until()`
+
+#### **std::recursive_mutex**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/recursive_mutex>)
+
+- Multiple locks can be acquired by the same thread
+- You need to call unlock the same amount of times you've called lock before the lock is released
+
+#### **std::recursive_timed_mutex**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/recursive_timed_mutex>)
+
+- Same as the recursive mutex, except it also has the timed locking methods that timed mutexes have
+
+#### **std::shared_timed_mutex**
+
+[Reference](<https://en.cppreference.com/w/cpp/thread/shared_timed_mutex>)
+
+- Read-Write mutex
+- Can acquire both exclusive or shared locks (just use the appropriate lock guard type!)
+
+```c++
+std::unique_lock<std::shared_timed_mutex> writer_guard(writing_mutex, std::defer_lock);
+std::shared_lock<std::shared_timed_mutex> reader_guard(reading_mutex, std::defer_lock);
+
+// Lock them!
+std::lock(writer_guard, reader_guard);
+```
+
+
+
+### 2.13 Event Handling: Condition Variables <a name="2.13"></a>
+[go to top](#top)
+
+
+Sometimes you need to do some nice signal/event handling.
+
+It's possible to do it using a global variable that you constantly lock threads for to check, but it's far more efficient to use **[condition variables](<https://en.cppreference.com/w/cpp/thread/condition_variable>)**.
+
+A condition variable allows you to **wait for some condition to be true** before continuing thread execution. During this time, any locks that were passed to the waiting function are released until the condition is fulfilled. Following which, the lock is reacquired.
+
+> **Example Flow**
+>
+> 1. Thread **acquires lock**
+> 2. Check if condition is false
+> 3. If false, call `wait()`, which **releases the lock and blocks the thread until the condition is fulfilled**
+> 4. If a condition is fulfilled, the condition variable **must be notified** before it can check
+> 5. Once the condition check succeeds, **thread reacquires lock and continues execution**
+
+Let's try it out!
+
+Condition variables use unique_locks, so we'll use that.
+
+#### **Basic Example**
+
+```c++
+#include <condition_variable>
+
+// Init
+std::condition_variable condition_var;
+std::mutex mutex;
+bool condition(false);
+
+// Acquire lock
+std::unique_lock<std::mutex> guard(mutex);
+
+// Avoid spurious wakeups and 
+// ensure wait is only called when the condition has not been fulfilled
+while (!condition)
+{
+  condition_var.wait(guard);
+}
+
+// Now in some other thread
+{
+  // Acquire lock
+  std::unique_lock<std::mutex> guard(mutex);
+
+  // We can set the condition to true
+  condition = true;
+
+  // And notify one blocked thread by the condition variable that it's ok to wake up
+  // (In this case we only have one)
+  condition_var.notify_one();
+
+  // If we want to notify all of them instead...
+  condition_var.notify_all();
+    
+  // If we didn't surround the threads with the while (!condition) loop,
+  // Notifying the threads will cause the wait to return. So there's no condition check.
+  // But this is dangerous since random wakeups can occur without notifications!
+}
+```
+
+**You may also choose to make the condition be an atomic boolean instead so you can save on lock acquisition for any thread that sets the condition.**
+
+Like so: `std::atomic<bool> condition(true);`
+
+#### **Additional Methods**
+
+```c++
+// Wait for some time or until some time is reached
+condition_var.wait_for();
+condition_var.wait_until();
+
+// There's also a nice function to cleanup any condition variables by a lock acquiring thread
+// It's an equivalent call to 
+// First: destroying all objects that are meant to destroy on thread exit
+// Then: mutex.unlock(); condition_var.notify_all();
+std::notify_all_at_thread_exit(condition_var, some_unique_lock);
+```
+
+#### **Spurious Wakeups**
+
+A bit tricky. But sometimes condition variables can wakeup on their own due to some [threading technomagic](<https://stackoverflow.com/questions/8594591/why-does-pthread-cond-wait-have-spurious-wakeups>).
+
+It's relatively trivial to guard against it, and it's another layer of protection against human error, so it makes sense to at least try to deal with them explicitly.
+
+```c++
+// You guard against spurious wakeups by surrounding the condition variable
+// with a check for the condition (you're checking the predicate)
+while (!condition)
+{
+  condition_var.wait(guard);
+}
+
+// Alternatively, you can do it this way as well,
+// which is neater but slightly less intuitive
+condition_var.wait(guard, condition_function);
+
+// If we want to just check a bool called condition we need to use lambdas
+condition_var.wait(guard, [](){return condition == true;});
+```
+
+
+
+## 3. C++ Concurrency Reference <a name="3"></a>
+
+### 3.1 Introduction <a name="3.1"></a>
+[go to top](#top)
+
+
+We just went through manual thread handling in the previous section.
+
+But if you're lazy, or you don't need the tight control the thread, mutex, and lock guard classes offer you, you may choose to adopt **task based parallelism** instead, as opposed to **thread based parallelism**. It's generally considered faster to work with tasks as opposed to threads, especially since the chance of tasks messing up is far lower than that of threads.
+
+With the `std::async` library, manual thread handling is **abstracted away**, and you rely on the library's system to possibly spawn threads, depending on available resources. **The main benefit of this form of parallelism is the great ease in getting returned values from tasks that you start.**
+
+Before, when using threads, you'd have to pass variables via reference and have threads modify the variable. But now with tasks, you can just directly return the result of the task!
+
+So instead of thinking of starting the threads yourself, you can only be concerned with starting **tasks** that will return when they are supposed to. If tasks haven't returned yet, the code will block until it does.
+
+
+
+### 3.2 When to Use Threads or Tasks <a name="3.2"></a>
+[go to top](#top)
+
+
+Use **threads** if:
+
+- You need tight control over mutexes
+- Need to run long-lived, complex tasks
+
+Use **tasks** if:
+
+- You want fairly simple code and don't care for managing threads
+- Are running short tasks
+
+
+
+### 3.3 Promises and Futures <a name="3.3"></a>
+[go to top](#top)
+
+
+![1562934941151](assets/1562934941151.png)
+
+[Image Source](<https://modoocode.com/284>)
+
+![1562935061068](assets/1562935061068.png)
+
+[Image Source](<https://www.slideshare.net/cppfrug/async-await-in-c>)
+
+#### **Header**
+
+```c++
+#include <future>
+```
+
+#### **Futures**
+
+A [std::future](<https://en.cppreference.com/w/cpp/thread/future>) is a class template that stores a value that will be assigned in the future, and provides a way to access that value (with `get()`). If its value is accessed before the value is assigned, it will block until the value resolves.
+
+Futures are the objects that are **returned** by asynchronous operations (from `std::async`, `std::packaged_task`, or `std::promise`).
+
+**Shared Futures**
+
+A [std::shared_future](<https://en.cppreference.com/w/cpp/thread/shared_future>) works the same way, except it is copyable. Which means that multiple threads are allowed to wait for the same shared state.
+
+#### **Promises**
+
+A [std::promise](<https://en.cppreference.com/w/cpp/thread/promise>) provides a facility to store a value that is later acquired asynchronously via the future **that the promise creates**.
+
+Every promise **is associated with a future**! And a promise **sets** the value of that future. Other objects can then access the future for the value that the promise stores.
+
+#### **A dumb analogy**
+
+> **Today is a Gift. That is why it is called Present.**
+>
+> You're a parent trying to get a gift for your child.
+>
+> You give your kid a box, and **promise** them that the gift is inside. The gift is the **future** you are promising. But you tell them to only to check in the future.
+>
+> If your kid tries to check, you panic, take the box away and, **block** them from checking, until you **fulfill your promise and fill the box** with the gift, then you can give it back and your kid can continue his day having gotten their gift.
+
+#### **A slightly better analogy**
+
+> **Food Analogy**
+>
+> Let's say you're an office worker. You make an order for lunch from a store across the street via your phone app.
+>
+> The store owner receives your order, and by the powers of the social contract, makes a **promise** to fulfill your order. He issues you a receipt that is associated with this **promise**, guaranteeing you that you will be able to collect your order in the **future** if he ever fulfills his promise.
+>
+> You **block** off some time, stop your work at the office, and head down to the store.
+>
+> But OH NO! The store owner hasn't fulfilled your order yet. And as long as you're waiting to **get()** your order, you can't do any work. Some might even say your **waiting to get your order in the future is blocking your ability to work.**
+>
+> Once the store owner **sets()** your order down, and lets you **get()** it from his counter though, you're able to **stop getting blocked** and go back to the office to work.
+
+![mindblow](assets/mindblow.gif)
+
+
+
+### 3.4 A Simple Promise-Future Example <a name="3.4"></a>
+[go to top](#top)
+
+
+![std::promise and std::future](assets/promise.png)
+
+[Image Source](<https://thispointer.com//c11-multithreading-part-8-stdfuture-stdpromise-and-returning-values-from-thread/>)
+
+**Note:** If your promise object is destroyed before you set its value, the `get()` method for its associated future will throw an exception.
+
+**Also note:** Each future's `get()` method can only be called once. If you want a future that can be accessed multiple times, use a shared_future instead. Otherwise, **initialise a different promise future pair.**
+
+```c++
+// Create a promise
+std::promise<int> promise;
+
+// And get its future
+std::future<int> future = promise.get_future();
+
+// You can also get a shared future this way, by the way! (Choose one please)
+std::shared_future<int> shared_future = promise.get_future();
+
+// Now suppose we passed promise to a separate thread.
+// And in the main thread we call...
+int val = future.get(); // This will block!
+
+// Until, that is, we set the future's value via the promise
+promise.set_value(10); // In the separate thread
+
+// So now in the main thread, if we try to access val...
+std::cout << val << std::endl;
+
+// Output: 10
+```
+
+Or, more completely
+
+```c++
+// Source: https://thispointer.com//c11-multithreading-part-8-stdfuture-stdpromise-and-returning-values-from-thread/
+
+#include <iostream>
+#include <thread>
+#include <future>
+ 
+void initiazer(std::promise<int> * promObj)
+{
+    std::cout<<"Inside Thread"<<std::endl;     promObj->set_value(35);
+}
+ 
+int main()
+{
+    std::promise<int> promiseObj;
+    std::future<int> futureObj = promiseObj.get_future();
+    std::thread th(initiazer, &promiseObj);
+    std::cout<<futureObj.get()<<std::endl;
+    th.join();
+    return 0;
+}
+```
+
+
+
+### 3.5 Async <a name="3.5"></a>
+[go to top](#top)
+
+
+[std::async](<https://en.cppreference.com/w/cpp/thread/async>)
+
+Now that we've talked about futures and promises we can finally actually get to the real asynchronous coding library.
+
+Async is a function template allows you to spawn threads to do work, then collect the results from them via the **future** mechanism. In fact, calls to `std::async` return a `std::future` object!
+
+**Do note that async does support parallelism, just that the default constructor manages threads for you and may possibly not run the passed functions in a thread. You'll have to explicitly tell it to run the function in a new thread.**
+
+Also, since Linux threads run sequentially by default, it's especially important to force the functions to run in separate threads. We'll see how to do that later.
+
+The simplest call to async is to just pass in a callback function as an argument, and let the system handle it for you.
+
+```c++
+auto future = std::async(some_function, arg_1, arg_2);
+```
+
+
+
+### 3.6 Async Launch Policies <a name="3.6"></a>
+[go to top](#top)
+
+
+You can do better though!
+
+There are three ways to launch an async task:
+
+- `std::launch::async` : Guarantees launch in a separate thread
+- `std::launch::deferred`: Function will only be called on `get()`
+- `std::launch::async | std::launch::deferred`: Default behaviour. Defer to system.
+
+I like to run async tasks with the `std::launch::async` profile so I can have some semblance of control over the threads. Just **add it in as the first argument!**
+
+```c++
+auto future = std::async(std::launch::async, some_function, arg_1, arg_2);
+```
+
+
+
+### 3.7 Different Ways to Call Async <a name="3.7"></a>
+[go to top](#top)
+
+
+```c++
+// Pass in function pointer
+auto future = std::async(std::launch::async, some_function, arg_1, arg_2);
+
+// Pass in function reference
+auto future = std::async(std::launch::async, &some_function, arg_1, arg_2);
+
+// Pass in function object
+struct SomeFunctionObject
+{
+	void operator() (int arg_1){}
+};
+auto future = std::async(std::launch::async, SomeFunctionObject(), arg_1);
+
+// Lambda function
+auto future = std::async(std::launch::async, [](){});
+```
+
+
+
+
+```
+                            .     .
+                         .  |\-^-/|  .    
+                        /| } O.=.O { |\     
+```
+
+---
+
+ [![Yeah! Buy the DRAGON a COFFEE!](../_assets/COFFEE%20BUTTON%20%E3%83%BE(%C2%B0%E2%88%87%C2%B0%5E).png)](https://www.buymeacoffee.com/methylDragon)
+
+ C++ Concurrency in Action: Practical Multithreading
+by Anthony Williams
+Overview
+Table of Contents
+Errata
+Buy the book
+
+C++ Concurrency in Action (second edition, published 2019 by Manning Publications) is the definitive reference and guide to writing multithreaded code with Standard C++. It is suitable for all levels of C++ programmers, including those who have never previously written any multithreaded code. This book will show you how to write robust multithreaded applications in C++ while avoiding common pitfalls.
+
+It's not just the best current treatment of C++11's threading facilities ... it's likely to remain the best for some time to come.Scott Meyers
+This book should be on every C++ programmer's desk. It's clear, concise, and valuable.Rob Green, Bowling Green State University
+Overview
+Systems with multiple processors or processors with multiple cores are the norm these days; even many phones have multicore processors. To take advantage of these processor cores you need to use concurrency, either in the form of multiple processes or multiple threads.
+
+The C++17 standard provides extensive support for writing multithreaded code to take advantage of these multicore and multiprocessor systems. C++ Concurrency in Action explains how these facilities work, and how to use them to best effect.
+
+This book provides a tutorial covering the use of the library facilities introduced in the last three C++ standards. It covers everything from the basics such as std::thread, std::future and std::condition_variable, to an in-depth description of the new memory model and std::atomic classes for low level synchronization and the new C++17 parallel algorithms. In later chapters, the book then goes on to cover the design of multithreaded code, including lock-free data structures and thread pools. Finally, there is a chapter on testing and debugging multithreaded applications.
+
+It doesn't stop there though: the appendices include a brief overview of the some of the C++ language features either used by the multithreading facilties, or commonly used in conjunction with them, such as variadic templates, lambda functions and rvalue references, as well as a 150 page reference covering every class and function in the C++ Standard Thread Library. The book also covers the additional facilities from the Concurrency TS that aren't yet part of the main C++ standard.
+
+Additional material in the second edition
+In addition to all the material from the first edition, the second edition (published in 2019) includes full coverage of the library changes from C++14 and C++17:
+
+std::shared_mutex and std::shared_timed_mutex. These provide for multiple-reader/single-writer mutex locks.
+std::scoped_lock from C++17 for locking multiple mutexes together.
+Parallel overloads of many standard library algorithms include std::sort, std::for_each and std::transform_reduce.
+Plus, full coverage of the library extensions from the concurrency TS:
+
+std::experimental::latch to allow waiting for a set number of events to occur
+std::experimental::barrier and std::experimental::flex_barrier to synchronize groups of threads
+std::experimental::atomic_shared_ptr to allow atomic accesses to a single shared_ptr instance from multiple threads, as a better alternative that the std::atomic_load and std::atomic_store free functions.
+Extended futures that allow continuations, so additional functions can be scheduled for when a future is ready.
+std::experimental::when_all and std::experimental::when_any to allow waiting for either all of a set of futures to be ready, or the first of a set of futures to be ready.
+
+geeksforgeeks
+Search...
+Courses
+Tutorials
+Interview Prep
+
+Sign In
+C++ Tutorial
+Interview Questions
+Examples
+Quizzes
+Projects
+Cheatsheet
+OOP
+Exception Handling
+STL
+DSA C++
+search icon
+Sign In
+Multithreading in C++
+Last Updated : 3 Oct, 2025
+Multithreading is a technique where a program is divided into smaller units of execution called threads. Each thread runs independently but shares resources like memory, allowing tasks to be performed simultaneously. This helps improve performance by utilizing multiple CPU cores efficiently. Multithreading support was introduced in C++11 with the introduction of <thread> header file.
+
+Importance of Multithreading
+Leverages multiple CPU cores to execute tasks in parallel, reducing overall execution time.
+Keeps applications responsive by running background operations without blocking the main thread. For example, in a word document, one thread does auto-formatting along with the main thread.
+Makes it easier to handle large workloads or multiple simultaneous operations, such as in servers or real-time systems.
+Common Operations On Thread
+The <thread> header in C++ provides a simple and powerful interface for managing threads. Below are some of the most common operations performed on threads:
+
+Create a Thread
+The std::thread class represent the thread. Threading an instance of this class will create a thread with the given callable as its task.
+
+
+thread thread_name(callable);
+where,
+
+thread_name: It is object of thread class.
+callable: It is a callable object like function pointer, function object.
+Example:
+
+
+#include <bits/stdc++.h>
+using namespace std;
+
+// Function to be run by the thread
+void func() {
+    cout << "Hello from the thread!" << endl;
+}
+
+int main() {
+    
+    // Create a thread that runs 
+    // the function func
+    thread t(func);
+    
+    // Main thread waits for 't' to finish
+    t.join();  
+    cout << "Main thread finished.";
+    return 0;
+}
+
+Output
+
+Hello from the thread!
+Main thread finished.
+Explanation: In the above program we have created a thread t that prints "Hello from the thread!" and this thread is joined with the main thread so that the main thread waits for the completion of this thread and once the thread t is finished the main thread resumes its execution and prints " Main thread finished".
+
+Joining a Thread
+Before joining a thread it is preferred to check if the thread can be joined using the joinable() method. The joinable method checks whether the thread is in a valid state for those operations or not.
+
+
+thread_name.joinable()
+The joinable() method returns true if the thread is joinable else returns false.
+
+Joining a Thread: Joining two threads C++ blocks the current thread until the thread associated with the std::thread object finishes execution. To join two threads ini C++ we can use join() function. Which is called inside the bidy of the thread to which the specified thread is to be joined.
+
+
+thread_name.join(); 
+The thread.join function throws std::system_error if the thread is not joinable.
+
+Note: Joining two non-main threads is risky as it may lead to race condition or logic errors.
+
+Detaching a thread
+A joined thread can be detached from the calling thread using the detach() member function of the std::thread class. When a thread is detached, it runs independently in the background, and the other thread does not waits for it to finish.
+
+
+thread_name.detach();
+Getting Thread ID
+In Multithreading in C++ each thread has a unique ID which can be obtained by using the get_id() function.
+
+
+thread_name.get_id();
+The get_id() function returns an object representing the thread’s ID
+
+Example program using the above operations altogether.
+
+
+
+#include <iostream>
+#include <thread>
+#include <chrono>
+using namespace std; 
+​
+void task1() {
+    cout << "Thread 1 is running. ID: " << this_thread::get_id() << "\n";
+}
+​
+void task2() {
+    cout << "Thread 2 is running. ID: " << this_thread::get_id() << "\n";
+}
+​
+int main() {
+    thread t1(task1);
+    thread t2(task2);
+​
+    // Get thread IDs
+    cout << "t1 ID: " << t1.get_id() << "\n";
+    cout << "t2 ID: " << t2.get_id() << "\n";
+​
+    // Join t1 if joinable
+    if (t1.joinable()) {
+        t1.join();
+        cout << "t1 joined\n";
+    }
+​
+    // Detach t2
+    if (t2.joinable()) {
+        t2.detach();
+        cout << "t2 detached\n";
+    }
+​
+    cout << "Main thread sleeping for 1 second...\n";
+    this_thread::sleep_for(chrono::seconds(1));
+    cout << "Main thread awake.\n";
+​
+    return 0;
+}
+Output:
+
+t1 ID: 0x1234
+t2 ID: 0x5678
+Thread 1 is running. ID: 0x1234
+t1 joined
+Thread 2 is running. ID: 0x5678
+t2 detached
+Main thread sleeping for 1 second...
+Main thread awake.
+Callables in Multithreading
+A callable (such as a function, lambda, or function object) is passed to a thread. The callable is executed in parallel by the thread when it starts. like, thread t(func); creates a thread that runs the func function. We can also pass parameters along with callable, like this thread t(func, param1, param2);
+
+In C++, callable can be divided into 4 categories:
+
+Function
+Lambda Expression
+Function Object
+Non-Static or static Member Function
+Function Pointer
+A function can be a callable object to pass to the thread constructor for initializing a thread.
+
+
+
+
+#include <bits/stdc++.h>
+using namespace std;
+​
+// Function to be run 
+// by the thread
+void func(int n) {
+    cout << n;
+}
+​
+int main() {
+    
+    // Create a thread that runs 
+    // the function func
+    thread t(func, 4);
+    
+    // Wait for thread to finish
+    t.join();
+    return 0;
+}
+
+Output
+
+4
+Lambda Expression
+Thread object can also use a lambda expression as a callable. Which can be passed directly inside the thread object.
+
+
+
+
+#include <iostream>
+#include <thread>
+​
+using namespace std;
+​
+int main() {
+    int n = 3;
+    
+    // Create a thread that runs 
+    // a lambda expression
+    thread t([](int n){
+        cout << n;
+    }, n);
+​
+    // Wait for the thread to complete
+    t.join();
+    return 0;
+}
+
+Output
+
+3
+Function Objects
+Function Objects or Functors can also be used for a thread as callable. To make functors callable, we need to overload the operator parentheses operator ().
+
+
+
+
+#include <iostream>
+#include <thread>
+using namespace std;
+​
+// Define a function object (functor)
+class SumFunctor {
+public:
+    int n;
+    SumFunctor(int a) : n(a) {}
+​
+    // Overload the operator() to 
+    // make it callable
+    void operator()() const {
+        cout << n;
+    }
+};
+​
+int main() {
+​
+    // Create a thread using 
+    // the functor object
+    thread t(SumFunctor(3));
+​
+    // Wait for the thread to 
+    // complete
+    t.join();
+    return 0;
+}
+
+Output
+
+3
+Non-Static and Static Member Function
+We can also use thread using the non-static or static member functions of a class. For non-static member function, we need to create an object of a class but it's not necessary with static member functions.
+
+
+
+
+#include <iostream>
+#include <thread>
+​
+using namespace std;
+​
+class MyClass {
+public:
+    // Non-static member function
+    void f1(int num) {
+        cout << num << endl;
+    }
+​
+    // Static member function that takes one parameter
+    static void f2(int num) {
+        cout << num;
+    }
+};
+​
+int main() {
+    
+    // Member functions 
+    // requires an object
+    MyClass obj;
+    
+    // Passing object and parameter
+    thread t1(&MyClass::f1, &obj, 3);
+    
+    t1.join(); 
+    
+    // Static member function can 
+    // be called without an object
+    thread t2(&MyClass::f2, 7);
+    
+    // Wait for the thread to finish
+    t2.join();  
+​
+    return 0;
+}
+
+Output
+
+3
+7
+Thread Management
+In C++ thread library, various functions are defined to manage threads that can be reused to perform multiple tasks. Some of the are listed below:
+
+Classes/Methods	Description
+join()	It ensures that the calling thread waits for the specified thread to complete its execution.
+detach()	Allows the thread to run independently of the main thread, meaning the main thread does not need to wait.
+mutex	A mutex is used to protect shared data between threads to prevent data races and ensure synchronization.
+lock_guard	A wrapper for mutexes that automatically locks and unlocks the mutex in a scoped block.
+condition_variable	Used to synchronize threads, allowing one thread to wait for a condition before proceeding.
+atomic	Manages shared variables between threads in a thread-safe manner without using locks.
+sleep_for()	Pauses the execution of the current thread for a specified duration.
+sleep_until()	Pauses the execution of the current thread until a specified time point is reached.
+hardware_concurrency()	Returns the number of hardware threads available for use, allowing you to optimize the use of system resources.
+get_id	Retrieves the unique ID of the current thread, useful for logging or debugging purposes.
+Problems with Multithreading
+Multithreading improves the performance and utilization of CPU, but it also introduces various problems:
+
+Deadlock
+Race Condition
+Starvation
+Deadlock
+A deadlock occurs when two or more threads are blocked forever because they are each waiting for shared resources that the other threads hold. This creates a cycle of waiting, and none of the threads can proceed.
+
+Race Condition
+A race condition occurs when two or more threads access shared resources at the same time, and at least one of them modifies the resource. Since the threads are competing to read and write the data, the final result depends on the order in which the threads execute, leading to unpredictable or incorrect results.
+
+Starvation
+Starvation occurs when a thread is continuously unable to access shared resources because other threads keep getting priority, preventing it from executing and making progress.
+
+Thread Synchronization
+In multithreading, synchronization is the way to control the access of multiple threads to shared resources, ensuring that only one thread can access a resource at a time to prevent data corruption or inconsistency. This is typically done using tools like mutexes, locks, and condition variables.
+
+Context switch in multithreading
+Context switch is a process in multithreading the process where the CPU stops the execution of one thread and begins executing another within the same process. In this process the CPU stores the state of the running thread so that it can be restored later once the CPU finishes the execution of the other thread.
+
+Comment
+S
+
+Sayan Mahapatra
+
+Follow
+
+137
+Article Tags:
+C++
+cpp-multithreading
+Explore
+C++ Basics
+Core Concepts
+OOP in C++
+Standard Template Library(STL)
+Practice & Problems
+GeeksforGeeks
+location
+Corporate & Communications Address:
+A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
+location
+Registered Address:
+K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
+GFG App on Play Store
+GFG App on App Store
+Company
+About Us
+Legal
+Privacy Policy
+Contact Us
+Advertise with us
+GFG Corporate Solution
+Campus Training Program
+Explore
+POTD
+Job-A-Thon
+Blogs
+Nation Skill Up
+Tutorials
+Programming Languages
+DSA
+Web Technology
+AI, ML & Data Science
+DevOps
+CS Core Subjects
+Interview Preparation
+Software and Tools
+Courses
+ML and Data Science
+DSA and Placements
+Web Development
+Programming Languages
+DevOps & Cloud
+GATE
+Trending Technologies
+Videos
+DSA
+Python
+Java
+C++
+Web Development
+Data Science
+CS Subjects
+Preparation Corner
+Interview Corner
+Aptitude
+Puzzles
+GfG 160
+System Design
+@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
+
+
+Do Not Sell or Share My Personal Information
\ No newline at end of file
diff --git a/doc/reference/concurrency-2.md b/doc/reference/concurrency-2.md
new file mode 100644
index 00000000..64ddc769
--- /dev/null
+++ b/doc/reference/concurrency-2.md
@@ -0,0 +1,805 @@
+# Understanding Concurrency in C++
+
+Imagine you're cooking dinner. You could boil water, wait for it to finish, then chop vegetables, wait, then sauté them—doing one thing at a time. Or you could put water on to boil, chop vegetables while waiting, and check on multiple pots simultaneously. The second approach is concurrency: managing multiple tasks that can overlap in time.
+
+This tutorial will take you from zero knowledge to confident understanding of how concurrent programs work. By the end, you'll see your programs—and your computer—in an entirely new light.
+
+## Part One: Why Concurrency Matters
+
+Modern computers have multiple processor cores. A quad-core laptop can literally do four things at once. But most programs use just one core, leaving the others idle. That's like having four expert chefs in your kitchen but only letting one cook while the others watch.
+
+Concurrency lets you use all your chefs.
+
+Consider downloading a large file. Without concurrency, your application freezes—the user interface becomes unresponsive because your single thread of execution is busy waiting for network data. With concurrency, one thread handles the download while another keeps the interface responsive. The user can continue working, perhaps cancel the download, or start another—all while data streams in.
+
+The benefits compound in computationally intensive work. Image processing, scientific simulations, video encoding—these tasks can be split into independent pieces. Process them simultaneously and your program finishes in a fraction of the time.
+
+But concurrency isn't free. It introduces complexity. Multiple threads accessing the same data can corrupt it. Threads waiting on each other can freeze forever. These problems—race conditions and deadlocks—are the dragons we'll learn to slay.
+
+First, though, we need to understand what a thread actually is.
+
+## Part Two: Threads—Your Program's Parallel Lives
+
+When you run a program, the operating system creates a *process* for it. This process gets its own memory space, its own resources, and at least one *thread of execution*—the main thread.
+
+Think of a thread as a bookmark in a book of instructions. It marks where you are in the code. The processor reads the instruction at that bookmark, executes it, and moves the bookmark forward. One thread means one bookmark—your program can only be at one place in the code at a time.
+
+But you can create additional threads. Each thread is its own bookmark, tracking its own position. Now your program can be at multiple places simultaneously. Each thread has its own *call stack*—its own record of which functions called which—but all threads share the same *heap memory*.
+
+This sharing is both the power and the peril of threads.
+
+Let's create our first thread.
+
+```cpp
+#include <iostream>
+#include <thread>
+
+void say_hello()
+{
+    std::cout << "Hello from a new thread!\n";
+}
+
+int main()
+{
+    std::thread t(say_hello);
+    t.join();
+    std::cout << "Back in the main thread.\n";
+    return 0;
+}
+```
+
+The `std::thread` constructor takes a function (or any *callable*—we'll see more later) and immediately starts a new thread running that function. Two bookmarks now move through your code simultaneously.
+
+The `join()` call is crucial. It makes the main thread wait until thread `t` finishes. Without it, `main()` might return and terminate the program before `say_hello()` completes. Always join your threads before they go out of scope.
+
+Let's see threads working in parallel.
+
+```cpp
+#include <iostream>
+#include <thread>
+
+void count_up(const char* name)
+{
+    for (int i = 1; i <= 5; ++i)
+        std::cout << name << ": " << i << "\n";
+}
+
+int main()
+{
+    std::thread alice(count_up, "Alice");
+    std::thread bob(count_up, "Bob");
+    
+    alice.join();
+    bob.join();
+    
+    return 0;
+}
+```
+
+Run this and you might see output like:
+
+```
+Alice: 1
+Bob: 1
+Alice: 2
+Bob: 2
+Alice: 3
+Bob: 3
+...
+```
+
+Or perhaps:
+
+```
+AliceBob: : 1
+1
+Alice: 2
+...
+```
+
+The interleaving varies each run. Both threads race to print, and their outputs jumble together. This unpredictability is your first glimpse of concurrent programming's fundamental challenge: when threads share resources (here, `std::cout`), chaos can ensue.
+
+## Part Three: Creating Threads—The Many Ways
+
+You've seen functions passed to `std::thread`. But threads accept any callable object: lambda expressions, function objects (functors), and member functions.
+
+Lambda expressions are often the clearest choice.
+
+```cpp
+#include <iostream>
+#include <thread>
+
+int main()
+{
+    int x = 42;
+    
+    std::thread t([x]() {
+        std::cout << "The value is: " << x << "\n";
+    });
+    
+    t.join();
+    return 0;
+}
+```
+
+The lambda captures `x` by value—it copies `x` into the lambda. This is important: by default, `std::thread` copies all arguments passed to it. Even if your function declares a reference parameter, the thread receives a copy.
+
+To pass by reference, use `std::ref()`.
+
+```cpp
+#include <iostream>
+#include <thread>
+
+void increment(int& value)
+{
+    ++value;
+}
+
+int main()
+{
+    int counter = 0;
+    
+    std::thread t(increment, std::ref(counter));
+    t.join();
+    
+    std::cout << "Counter is now: " << counter << "\n";
+    return 0;
+}
+```
+
+Without `std::ref()`, the thread would modify a copy, leaving `counter` unchanged. With it, the thread modifies the original.
+
+Functors—objects with an overloaded `operator()`—work too.
+
+```cpp
+#include <iostream>
+#include <thread>
+
+class Counter
+{
+    int limit_;
+public:
+    Counter(int limit) : limit_(limit) {}
+    
+    void operator()() const
+    {
+        for (int i = 0; i < limit_; ++i)
+            std::cout << i << " ";
+        std::cout << "\n";
+    }
+};
+
+int main()
+{
+    std::thread t(Counter(5));
+    t.join();
+    return 0;
+}
+```
+
+For member functions, pass a pointer to the function and an instance.
+
+```cpp
+#include <iostream>
+#include <thread>
+
+class Greeter
+{
+public:
+    void greet(const std::string& name)
+    {
+        std::cout << "Hello, " << name << "!\n";
+    }
+};
+
+int main()
+{
+    Greeter g;
+    std::thread t(&Greeter::greet, &g, "World");
+    t.join();
+    return 0;
+}
+```
+
+The `&Greeter::greet` syntax names the member function; `&g` provides the instance to call it on.
+
+## Part Four: Thread Lifecycle—Join, Detach, and Destruction
+
+Every thread must be either *joined* or *detached* before its `std::thread` object is destroyed. Failing to do so calls `std::terminate()`, abruptly ending your program.
+
+We've used `join()` extensively. It blocks the calling thread until the target thread finishes. This is how you wait for work to complete.
+
+```cpp
+std::thread t(do_work);
+// ... do other things ...
+t.join();  // wait for do_work to finish
+```
+
+Sometimes you want a thread to run independently, continuing even after the `std::thread` object is destroyed. That's what `detach()` does.
+
+```cpp
+std::thread t(background_task);
+t.detach();  // thread runs independently
+// t is now "empty"—no longer associated with a thread
+```
+
+A detached thread becomes a *daemon thread*. It runs until it finishes or the program exits. You lose all ability to wait for it or check its status. Use detachment sparingly—usually for truly fire-and-forget background work.
+
+Before joining or detaching, you can check if a thread is *joinable*.
+
+```cpp
+std::thread t(some_function);
+
+if (t.joinable())
+{
+    t.join();
+}
+```
+
+A thread is joinable if it represents an actual thread of execution. After joining or detaching, or after default construction, a `std::thread` is not joinable.
+
+## Part Five: The Danger—Race Conditions
+
+Now we confront the central challenge of concurrent programming.
+
+When multiple threads read the same data, all is well. But when at least one thread writes while others read or write, you have a *data race*. The result is undefined behavior—crashes, corruption, or silent errors.
+
+Consider this innocent-looking code.
+
+```cpp
+#include <iostream>
+#include <thread>
+
+int counter = 0;
+
+void increment_many_times()
+{
+    for (int i = 0; i < 100000; ++i)
+        ++counter;
+}
+
+int main()
+{
+    std::thread t1(increment_many_times);
+    std::thread t2(increment_many_times);
+    
+    t1.join();
+    t2.join();
+    
+    std::cout << "Counter: " << counter << "\n";
+    return 0;
+}
+```
+
+Two threads, each incrementing 100,000 times. You'd expect 200,000. But run this repeatedly and you'll see different results—180,000, 195,327, maybe occasionally 200,000. Something is wrong.
+
+The `++counter` operation looks atomic—indivisible—but it isn't. It actually consists of three steps: read the current value, add one, write the result back. Between any of these steps, the other thread might execute its own steps.
+
+Imagine both threads read `counter` when it's 5. Both add one, getting 6. Both write 6 back. Two increments, but the counter only went up by one. This is a *lost update*, a classic race condition.
+
+The more threads, the more opportunity for races. The faster your processor, the more instructions execute between context switches, potentially hiding the bug—until one critical day in production.
+
+## Part Six: Mutual Exclusion—Mutexes
+
+The solution to data races is *mutual exclusion*: ensuring that only one thread accesses shared data at a time.
+
+A *mutex* (mutual exclusion object) is a lockable resource. Before accessing shared data, a thread *locks* the mutex. If another thread already holds the lock, the requesting thread blocks until the lock is released. This serializes access to the protected data.
+
+```cpp
+#include <iostream>
+#include <thread>
+#include <mutex>
+
+int counter = 0;
+std::mutex counter_mutex;
+
+void increment_many_times()
+{
+    for (int i = 0; i < 100000; ++i)
+    {
+        counter_mutex.lock();
+        ++counter;
+        counter_mutex.unlock();
+    }
+}
+
+int main()
+{
+    std::thread t1(increment_many_times);
+    std::thread t2(increment_many_times);
+    
+    t1.join();
+    t2.join();
+    
+    std::cout << "Counter: " << counter << "\n";
+    return 0;
+}
+```
+
+Now the output is always 200,000. The mutex ensures that between `lock()` and `unlock()`, only one thread executes. The increment is now effectively atomic.
+
+But there's a problem with calling `lock()` and `unlock()` directly. If code between them throws an exception, `unlock()` never executes. The mutex stays locked forever, and any thread waiting for it blocks eternally—a *deadlock*.
+
+## Part Seven: Lock Guards—Safety Through RAII
+
+C++ has a powerful idiom: *RAII* (Resource Acquisition Is Initialization). The idea is simple: acquire resources in a constructor, release them in the destructor. Since destructors run even when exceptions are thrown, cleanup is guaranteed.
+
+Lock guards apply RAII to mutexes.
+
+```cpp
+#include <iostream>
+#include <thread>
+#include <mutex>
+
+int counter = 0;
+std::mutex counter_mutex;
+
+void increment_many_times()
+{
+    for (int i = 0; i < 100000; ++i)
+    {
+        std::lock_guard<std::mutex> lock(counter_mutex);
+        ++counter;
+        // lock is automatically released when it goes out of scope
+    }
+}
+```
+
+The `std::lock_guard` locks the mutex on construction and unlocks it on destruction. Even if an exception is thrown, the destructor runs and the mutex is released. This is the correct way to use mutexes.
+
+Since C++17, `std::scoped_lock` is preferred. It works like `lock_guard` but can lock multiple mutexes simultaneously, avoiding a class of deadlock we'll discuss shortly.
+
+```cpp
+std::scoped_lock lock(counter_mutex);  // C++17
+```
+
+For more control, use `std::unique_lock`. It can be unlocked before destruction, moved to another scope, or created without immediately locking.
+
+```cpp
+std::unique_lock<std::mutex> lock(some_mutex, std::defer_lock);
+// mutex not yet locked
+
+lock.lock();  // lock when ready
+// ... do work ...
+lock.unlock();  // unlock early if needed
+// ... do other work ...
+// destructor unlocks again if still locked
+```
+
+`std::unique_lock` is more flexible but slightly more expensive than `std::lock_guard`. Use the simplest tool that does the job.
+
+## Part Eight: The Deadlock Dragon
+
+Mutexes solve data races but introduce a new danger: *deadlock*.
+
+Imagine two threads and two mutexes. Thread A locks mutex 1, then tries to lock mutex 2. Thread B locks mutex 2, then tries to lock mutex 1. Each thread holds one mutex and waits for the other. Neither can proceed. The program freezes.
+
+```cpp
+std::mutex mutex1, mutex2;
+
+void thread_a()
+{
+    std::lock_guard<std::mutex> lock1(mutex1);
+    std::lock_guard<std::mutex> lock2(mutex2);  // blocks, waiting for B
+    // ...
+}
+
+void thread_b()
+{
+    std::lock_guard<std::mutex> lock2(mutex2);
+    std::lock_guard<std::mutex> lock1(mutex1);  // blocks, waiting for A
+    // ...
+}
+```
+
+If both threads run and each acquires its first mutex before the other acquires the second, deadlock occurs.
+
+The simplest prevention: always lock mutexes in the same order. If every thread locks `mutex1` before `mutex2`, no cycle can form.
+
+When you need to lock multiple mutexes and can't guarantee order, use `std::scoped_lock`.
+
+```cpp
+void safe_function()
+{
+    std::scoped_lock lock(mutex1, mutex2);  // locks both atomically
+    // ...
+}
+```
+
+`std::scoped_lock` uses a deadlock-avoidance algorithm internally, acquiring both mutexes without risk of circular waiting.
+
+## Part Nine: Atomics—Lock-Free Simplicity
+
+For simple operations on simple data, mutexes might be overkill. *Atomic types* provide lock-free thread safety for individual values.
+
+An atomic operation completes entirely before any other thread can observe its effects. There's no intermediate state.
+
+```cpp
+#include <iostream>
+#include <thread>
+#include <atomic>
+
+std::atomic<int> counter{0};
+
+void increment_many_times()
+{
+    for (int i = 0; i < 100000; ++i)
+        ++counter;  // atomic increment
+}
+
+int main()
+{
+    std::thread t1(increment_many_times);
+    std::thread t2(increment_many_times);
+    
+    t1.join();
+    t2.join();
+    
+    std::cout << "Counter: " << counter << "\n";
+    return 0;
+}
+```
+
+No mutex, no lock guard, yet the result is always 200,000. The `std::atomic<int>` ensures that increments are indivisible.
+
+Atomics work best for single-variable operations: counters, flags, simple state. They're faster than mutexes when contention is low. But they can't protect complex operations involving multiple variables—for that, you need mutexes.
+
+Common atomic types include `std::atomic<bool>`, `std::atomic<int>`, and `std::atomic<std::shared_ptr<T>>`. Any trivially copyable type can be made atomic.
+
+## Part Ten: Condition Variables—Threads That Wait Intelligently
+
+Sometimes a thread must wait for a specific condition before proceeding. You could loop, repeatedly checking:
+
+```cpp
+while (!ready)
+{
+    std::this_thread::sleep_for(std::chrono::milliseconds(100));
+}
+```
+
+This works but wastes CPU cycles and introduces latency. *Condition variables* provide efficient waiting.
+
+A condition variable allows one thread to signal others that something has changed. Waiting threads sleep until notified, consuming no CPU.
+
+```cpp
+#include <iostream>
+#include <thread>
+#include <mutex>
+#include <condition_variable>
+
+std::mutex mtx;
+std::condition_variable cv;
+bool ready = false;
+
+void worker()
+{
+    std::unique_lock<std::mutex> lock(mtx);
+    cv.wait(lock, []{ return ready; });  // wait until ready is true
+    std::cout << "Worker proceeding!\n";
+}
+
+void signal_ready()
+{
+    {
+        std::lock_guard<std::mutex> lock(mtx);
+        ready = true;
+    }
+    cv.notify_one();  // wake one waiting thread
+}
+
+int main()
+{
+    std::thread t(worker);
+    
+    std::this_thread::sleep_for(std::chrono::seconds(1));
+    signal_ready();
+    
+    t.join();
+    return 0;
+}
+```
+
+The worker thread calls `cv.wait()`, which atomically releases the mutex and suspends the thread. When `signal_ready()` calls `notify_one()`, the worker wakes up, reacquires the mutex, checks the condition, and proceeds.
+
+The lambda `[]{ return ready; }` is the predicate. `wait()` won't return until this evaluates to true. This guards against *spurious wakeups*—rare events where a thread wakes without notification. Always use a predicate.
+
+Use `notify_one()` to wake a single waiting thread, or `notify_all()` to wake them all.
+
+## Part Eleven: Shared Locks—Readers and Writers
+
+Consider a data structure that's read frequently but written rarely. A regular mutex serializes all access—but why block readers from each other? Multiple threads can safely read simultaneously; only writes require exclusive access.
+
+*Shared mutexes* support this pattern.
+
+```cpp
+#include <iostream>
+#include <thread>
+#include <shared_mutex>
+#include <vector>
+
+std::shared_mutex rw_mutex;
+std::vector<int> data;
+
+void reader(int id)
+{
+    std::shared_lock<std::shared_mutex> lock(rw_mutex);  // shared access
+    std::cout << "Reader " << id << " sees " << data.size() << " elements\n";
+}
+
+void writer(int value)
+{
+    std::unique_lock<std::shared_mutex> lock(rw_mutex);  // exclusive access
+    data.push_back(value);
+    std::cout << "Writer added " << value << "\n";
+}
+```
+
+`std::shared_lock` acquires a *shared lock*—multiple threads can hold shared locks simultaneously. `std::unique_lock` on a shared mutex acquires an *exclusive lock*—no other locks (shared or exclusive) can be held.
+
+While any reader holds a shared lock, writers must wait. While a writer holds an exclusive lock, everyone waits. This pattern maximizes concurrency for read-heavy workloads.
+
+## Part Twelve: Futures and Promises—Getting Results Back
+
+We've focused on threads as parallel workers. But how do you get results from them?
+
+Passing references works but is clunky. C++ offers a cleaner abstraction: *futures* and *promises*.
+
+A `std::promise` is a write-once container: a thread can set its value. A `std::future` is the corresponding read-once container: another thread can get that value. They form a one-way communication channel.
+
+```cpp
+#include <iostream>
+#include <thread>
+#include <future>
+
+void compute(std::promise<int> result_promise)
+{
+    int answer = 6 * 7;  // expensive computation
+    result_promise.set_value(answer);
+}
+
+int main()
+{
+    std::promise<int> promise;
+    std::future<int> future = promise.get_future();
+    
+    std::thread t(compute, std::move(promise));
+    
+    std::cout << "Waiting for result...\n";
+    int result = future.get();  // blocks until value is set
+    std::cout << "The answer is: " << result << "\n";
+    
+    t.join();
+    return 0;
+}
+```
+
+The worker thread calls `set_value()`. The main thread calls `get()`, which blocks until the value is available. Simple, clean, safe.
+
+A future's `get()` can only be called once. For multiple consumers, use `std::shared_future`.
+
+## Part Thirteen: Async—The Easy Path
+
+Creating threads manually, managing promises, joining at the end—it's mechanical. `std::async` automates it.
+
+```cpp
+#include <iostream>
+#include <future>
+
+int compute()
+{
+    return 6 * 7;
+}
+
+int main()
+{
+    std::future<int> future = std::async(compute);
+    
+    std::cout << "Computing...\n";
+    int result = future.get();
+    std::cout << "Result: " << result << "\n";
+    
+    return 0;
+}
+```
+
+`std::async` launches the function (potentially in a new thread), returning a future. No explicit thread creation, no promise management, no join call.
+
+By default, the system decides whether to run the function in a new thread or defer it until you call `get()`. To force a new thread:
+
+```cpp
+auto future = std::async(std::launch::async, compute);
+```
+
+To defer execution until `get()`:
+
+```cpp
+auto future = std::async(std::launch::deferred, compute);
+```
+
+For quick parallel tasks, `std::async` is often the cleanest choice.
+
+## Part Fourteen: Thread-Local Storage
+
+Sometimes each thread needs its own copy of a variable—not shared, not copied each call, but persistent within that thread.
+
+Declare it `thread_local`.
+
+```cpp
+#include <iostream>
+#include <thread>
+
+thread_local int counter = 0;
+
+void increment_and_print(const char* name)
+{
+    ++counter;
+    std::cout << name << " counter: " << counter << "\n";
+}
+
+int main()
+{
+    std::thread t1([]{ 
+        increment_and_print("T1");
+        increment_and_print("T1");
+    });
+    
+    std::thread t2([]{
+        increment_and_print("T2");
+        increment_and_print("T2");
+    });
+    
+    t1.join();
+    t2.join();
+    
+    return 0;
+}
+```
+
+Each thread sees its own `counter`. T1 prints 1, then 2. T2 independently prints 1, then 2. No synchronization needed because the data isn't shared.
+
+Thread-local storage is perfect for per-thread caches, random number generators, or error state.
+
+## Part Fifteen: Practical Patterns
+
+Let's combine what we've learned into practical patterns.
+
+**Producer-Consumer Queue**
+
+One or more threads produce work items; one or more threads consume them. A queue connects them.
+
+```cpp
+#include <iostream>
+#include <thread>
+#include <mutex>
+#include <condition_variable>
+#include <queue>
+
+template<typename T>
+class ThreadSafeQueue
+{
+    std::queue<T> queue_;
+    std::mutex mutex_;
+    std::condition_variable cv_;
+    
+public:
+    void push(T value)
+    {
+        {
+            std::lock_guard<std::mutex> lock(mutex_);
+            queue_.push(std::move(value));
+        }
+        cv_.notify_one();
+    }
+    
+    T pop()
+    {
+        std::unique_lock<std::mutex> lock(mutex_);
+        cv_.wait(lock, [this]{ return !queue_.empty(); });
+        T value = std::move(queue_.front());
+        queue_.pop();
+        return value;
+    }
+};
+
+ThreadSafeQueue<int> work_queue;
+
+void producer()
+{
+    for (int i = 0; i < 10; ++i)
+    {
+        work_queue.push(i);
+        std::cout << "Produced: " << i << "\n";
+    }
+}
+
+void consumer()
+{
+    for (int i = 0; i < 10; ++i)
+    {
+        int item = work_queue.pop();
+        std::cout << "Consumed: " << item << "\n";
+    }
+}
+
+int main()
+{
+    std::thread prod(producer);
+    std::thread cons(consumer);
+    
+    prod.join();
+    cons.join();
+    
+    return 0;
+}
+```
+
+The producer pushes items; the consumer waits for items and processes them. The condition variable ensures the consumer sleeps efficiently when the queue is empty.
+
+**Parallel For**
+
+Split a loop across multiple threads.
+
+```cpp
+#include <iostream>
+#include <thread>
+#include <vector>
+
+void parallel_for(int start, int end, int num_threads, 
+                  std::function<void(int)> func)
+{
+    std::vector<std::thread> threads;
+    int chunk_size = (end - start) / num_threads;
+    
+    for (int t = 0; t < num_threads; ++t)
+    {
+        int chunk_start = start + t * chunk_size;
+        int chunk_end = (t == num_threads - 1) ? end : chunk_start + chunk_size;
+        
+        threads.emplace_back([=]{
+            for (int i = chunk_start; i < chunk_end; ++i)
+                func(i);
+        });
+    }
+    
+    for (auto& thread : threads)
+        thread.join();
+}
+
+int main()
+{
+    std::mutex print_mutex;
+    
+    parallel_for(0, 20, 4, [&](int i){
+        std::lock_guard<std::mutex> lock(print_mutex);
+        std::cout << "Processing " << i << " on thread " 
+                  << std::this_thread::get_id() << "\n";
+    });
+    
+    return 0;
+}
+```
+
+The work is divided into chunks, each handled by its own thread. For CPU-bound work on large datasets, this can dramatically reduce execution time.
+
+## Part Sixteen: What You've Learned
+
+You began knowing nothing about concurrency. Now you understand:
+
+- **Threads** are independent flows of execution within a process
+- **Mutexes** provide mutual exclusion to prevent data races
+- **Lock guards** ensure mutexes are properly released, even on exceptions
+- **Atomics** offer lock-free safety for simple operations
+- **Condition variables** let threads wait efficiently for events
+- **Shared locks** allow multiple readers or one writer
+- **Futures and promises** communicate results between threads
+- **std::async** simplifies launching parallel work
+
+You've seen the dangers—race conditions, deadlocks—and the tools to avoid them.
+
+Concurrency is challenging. Bugs hide until the worst moment. Testing is hard because timing varies. But the rewards are substantial: responsive applications, full hardware utilization, and elegant solutions to naturally parallel problems.
+
+The best advice: start simple. Use `std::async` when possible. Prefer immutable data. When you must share mutable state, protect it carefully. Minimize the time locks are held. Avoid nested locks when you can; when you can't, use `std::scoped_lock`.
+
+And test. Test with many threads. Test on different machines. Test under load.
+
+Welcome to concurrent programming. The parallel path awaits.
diff --git a/doc/reference/coro-tutorial.md b/doc/reference/coro-tutorial.md
new file mode 100644
index 00000000..6b859a3e
--- /dev/null
+++ b/doc/reference/coro-tutorial.md
@@ -0,0 +1,1425 @@
+# How To Understand C++20 Coroutines from the Ground Up
+
+### Introduction
+
+For over two decades, C++ programmers have wrestled with a fundamental challenge: how to write code that waits for things to happen without blocking everything else. Network requests need to complete. Files need to be read. User input must arrive. The traditional solutions—threads, callbacks, and state machines—each carry their own burden of complexity. Threads consume system resources and require careful synchronization. Callbacks scatter your logic across multiple functions. State machines bury simple ideas beneath layers of bookkeeping.
+
+C++20 introduces *coroutines*, a language feature that addresses this challenge directly. A coroutine is a function that can suspend its execution midway through, preserve its state, and resume later from exactly where it left off. This capability transforms the way you write asynchronous code, allowing you to express complex sequences of operations as straightforward, linear logic.
+
+In this tutorial, you will explore C++20 coroutines from the most basic concepts to practical implementations. You will begin by understanding the problem coroutines solve, then build your first coroutine step by step. By the end, you will have constructed a working generator type and understand the machinery that makes coroutines possible.
+
+## Prerequisites
+
+Before beginning this tutorial, you should have the following:
+
+- A C++ compiler with C++20 support (GCC 10+, Clang 14+, or MSVC 2019 16.8+)
+- Familiarity with basic C++ concepts: functions, classes, templates, and lambdas
+- Understanding of how function calls work: the call stack, local variables, and return values
+- A text editor or IDE configured for C++ development
+
+The examples in this tutorial use standard C++20 features. If using GCC, compile with:
+
+```bash
+g++ -std=c++20 -fcoroutines your_file.cpp
+```
+
+If using Clang, compile with:
+
+```bash
+clang++ -std=c++20 your_file.cpp
+```
+
+If using MSVC, enable C++20 in your project settings or compile with:
+
+```bash
+cl /std:c++20 your_file.cpp
+```
+
+## Step 1 — Understanding the Problem Coroutines Solve
+
+Before diving into coroutines, you must understand why they exist. Consider a server application that needs to handle an incoming network request. The server must read the request from the network, parse it, possibly read from a database, compute a response, and send that response back. Each of these steps might take time to complete.
+
+In traditional synchronous code, you might write something like this:
+
+```cpp
+void handle_request(connection& conn)
+{
+    std::string request = conn.read();      // blocks until data arrives
+    auto parsed = parse_request(request);
+    auto data = database.query(parsed.id);  // blocks until database responds
+    auto response = compute_response(data);
+    conn.write(response);                   // blocks until write completes
+}
+```
+
+This code reads naturally from top to bottom. The logic flows in a straight line. But there is a problem: while waiting for the network or database, this function blocks the entire thread. If you have thousands of concurrent connections, you would need thousands of threads, each consuming memory and requiring the operating system to schedule them.
+
+The traditional alternative uses callbacks:
+
+```cpp
+void handle_request(connection& conn)
+{
+    conn.async_read([&conn](std::string request) {
+        auto parsed = parse_request(request);
+        database.async_query(parsed.id, [&conn](auto data) {
+            auto response = compute_response(data);
+            conn.async_write(response, [&conn]() {
+                // request complete
+            });
+        });
+    });
+}
+```
+
+This code does not block. Each operation starts, registers a callback, and returns immediately. When the operation completes, the callback runs. But look what has happened to the code: three levels of nesting, logic scattered across multiple lambda functions, and local variables that cannot be shared between callbacks without careful lifetime management.
+
+David Mazières, in his exploration of C++ coroutines, described the pain of this approach vividly. In his SMTP server code, a single logical operation named `cmd_rcpt` had to be split across seven separate functions: `cmd_rcpt`, `cmd_rcpt_0`, `cmd_rcpt_2`, `cmd_rcpt_3`, `cmd_rcpt_4`, `cmd_rcpt_5`, and `cmd_rcpt_6`. Each function represented a different return point from an asynchronous operation. The logic of a single command was scattered across the codebase.
+
+Coroutines solve this problem by allowing you to write code that looks synchronous but behaves asynchronously:
+
+```cpp
+task<void> handle_request(connection& conn)
+{
+    std::string request = co_await conn.async_read();
+    auto parsed = parse_request(request);
+    auto data = co_await database.async_query(parsed.id);
+    auto response = compute_response(data);
+    co_await conn.async_write(response);
+}
+```
+
+This code reads just like the original blocking version. The logic flows from top to bottom. Local variables like `request`, `parsed`, and `data` exist naturally in their scope. Yet the function suspends at each `co_await` point, allowing other work to proceed while waiting.
+
+The variable `request` maintains its value even though the function may suspend and resume multiple times. This is the fundamental capability that coroutines provide: the preservation of local state across suspension points.
+
+You have now seen the problem that coroutines solve. The callback approach fragments your logic. Coroutines restore the natural flow of code while maintaining asynchronous behavior.
+
+## Step 2 — Recognizing Coroutines by Their Keywords
+
+A coroutine in C++20 looks almost like a regular function. The difference lies in what appears inside the function body. A function becomes a coroutine when it contains any of three special keywords: `co_await`, `co_yield`, or `co_return`.
+
+The keyword `co_await` suspends the coroutine and waits for some operation to complete. When you write `co_await expr`, the coroutine saves its state, pauses execution, and potentially allows other code to run. When the awaited operation completes, the coroutine resumes from exactly where it left off.
+
+The keyword `co_yield` produces a value and suspends the coroutine. This is useful for generators—functions that produce a sequence of values one at a time. After yielding a value, the coroutine pauses until someone asks for the next value.
+
+The keyword `co_return` completes the coroutine and optionally provides a final result. Unlike a regular `return` statement, `co_return` interacts with the coroutine machinery to properly finalize the coroutine's state.
+
+Here is the simplest possible coroutine:
+
+```cpp
+#include <coroutine>
+
+struct SimpleCoroutine {
+    struct promise_type {
+        SimpleCoroutine get_return_object() { return {}; }
+        std::suspend_never initial_suspend() { return {}; }
+        std::suspend_never final_suspend() noexcept { return {}; }
+        void return_void() {}
+        void unhandled_exception() {}
+    };
+};
+
+SimpleCoroutine my_first_coroutine()
+{
+    co_return;  // This makes it a coroutine
+}
+```
+
+Do not worry about the `promise_type` structure yet. You will explore it in detail later. For now, observe that the presence of `co_return` transforms what looks like a regular function into a coroutine.
+
+If you try to compile a function with these keywords but without proper infrastructure, the compiler will produce errors. The C++ coroutine mechanism requires certain types and functions to exist. This is why the example includes the `promise_type` nested structure—it provides the minimum scaffolding the compiler needs.
+
+The distinction between regular functions and coroutines matters because they behave fundamentally differently at runtime:
+
+- A regular function allocates its local variables on the stack. When it returns, those variables are gone.
+- A coroutine allocates its local variables in a heap-allocated *coroutine frame*. When it suspends, those variables persist. When it resumes, they are still there.
+
+This persistence of state is what allows coroutines to pause and resume while maintaining their local variables.
+
+You have now learned to recognize coroutines by their keywords. The presence of `co_await`, `co_yield`, or `co_return` signals that a function is a coroutine with special runtime behavior.
+
+## Step 3 — Understanding Suspension and Resumption
+
+The heart of coroutines is the ability to suspend execution and resume it later. To understand how this works, you must examine what happens when a coroutine suspends.
+
+When you call a regular function, the system allocates space on the call stack for the function's local variables and parameters. When the function returns, this stack space is reclaimed. The function's state exists only during the call.
+
+When you call a coroutine, something different happens. The system allocates a *coroutine frame* on the heap. This frame holds the coroutine's local variables, parameters, and information about where execution should resume. Because the frame lives on the heap rather than the stack, it persists even when the coroutine is not actively running.
+
+Consider this example:
+
+```cpp
+#include <coroutine>
+#include <iostream>
+
+struct ReturnObject {
+    struct promise_type {
+        ReturnObject get_return_object() { return {}; }
+        std::suspend_never initial_suspend() { return {}; }
+        std::suspend_never final_suspend() noexcept { return {}; }
+        void return_void() {}
+        void unhandled_exception() {}
+    };
+};
+
+struct Awaiter {
+    std::coroutine_handle<>* handle_out;
+    
+    bool await_ready() { return false; }
+    void await_suspend(std::coroutine_handle<> h) {
+        *handle_out = h;
+    }
+    void await_resume() {}
+};
+
+ReturnObject counter(std::coroutine_handle<>* handle)
+{
+    Awaiter awaiter{handle};
+    
+    for (unsigned i = 0; ; ++i) {
+        std::cout << "counter: " << i << std::endl;
+        co_await awaiter;
+    }
+}
+
+int main()
+{
+    std::coroutine_handle<> h;
+    counter(&h);
+    
+    for (int i = 0; i < 3; ++i) {
+        std::cout << "main: resuming" << std::endl;
+        h();
+    }
+    
+    h.destroy();
+}
+```
+
+**Output:**
+
+```
+counter: 0
+main: resuming
+counter: 1
+main: resuming
+counter: 2
+main: resuming
+counter: 3
+```
+
+Study what happens in this example:
+
+1. The `main` function calls `counter`, passing the address of a coroutine handle.
+2. The `counter` coroutine begins executing. It prints "counter: 0" and then reaches `co_await awaiter`.
+3. The `co_await` expression checks if the awaiter is ready by calling `await_ready()`. It returns `false`, so suspension proceeds.
+4. The coroutine saves its state—including the value of `i`—to the coroutine frame.
+5. The `await_suspend` method receives a handle to the suspended coroutine and stores it in `main`'s variable `h`.
+6. Control returns to `main`, which now holds a handle to the suspended coroutine.
+7. The `main` function calls `h()`, which resumes the coroutine.
+8. The coroutine continues from where it left off, increments `i`, prints its new value, and suspends again.
+9. This cycle repeats until `main` destroys the coroutine.
+
+The variable `i` inside `counter` maintains its value across all these suspension and resumption cycles. It starts at 0, increments to 1, then 2, then 3. Each time the coroutine resumes, `i` is exactly where it was when the coroutine suspended.
+
+A `std::coroutine_handle<>` is a lightweight object, similar to a pointer. It references the coroutine frame on the heap. Calling the handle (using `h()` or `h.resume()`) resumes the coroutine. The handle does not own the coroutine frame—you must eventually call `h.destroy()` to free the memory.
+
+The Awaiter type in this example demonstrates the three methods that `co_await` uses:
+
+- `await_ready()`: Returns `true` if the result is immediately available and no suspension is needed. Returns `false` to proceed with suspension.
+- `await_suspend(handle)`: Called when the coroutine suspends. Receives the coroutine handle, allowing external code to later resume the coroutine.
+- `await_resume()`: Called when the coroutine resumes. Its return value becomes the value of the `co_await` expression.
+
+The C++ standard library provides two predefined awaiters: `std::suspend_always` and `std::suspend_never`. As their names suggest, `suspend_always::await_ready()` always returns `false` (always suspend), while `suspend_never::await_ready()` always returns `true` (never suspend).
+
+You have now seen how suspension and resumption work. The coroutine frame preserves state on the heap, and the coroutine handle provides a way to resume execution.
+
+## Step 4 — Understanding the Promise Type
+
+Every coroutine has an associated *promise type*. This type acts as a controller for the coroutine, defining how it behaves at key points in its lifecycle. The promise type is not something you pass to the coroutine—it is a nested type inside the coroutine's return type that the compiler uses automatically.
+
+The compiler expects to find a type named `promise_type` nested inside your coroutine's return type. If your coroutine returns `Generator<int>`, the compiler looks for `Generator<int>::promise_type`. This promise type must provide certain methods that the compiler calls at specific points during the coroutine's execution.
+
+Here are the required methods:
+
+**`get_return_object()`**: Called to create the object that will be returned to the caller of the coroutine. This happens before the coroutine body begins executing.
+
+**`initial_suspend()`**: Called immediately after `get_return_object()`. Returns an awaiter that determines whether the coroutine should suspend before running any of its body. Return `std::suspend_never{}` to start executing immediately, or `std::suspend_always{}` to suspend before the first statement.
+
+**`final_suspend()`**: Called when the coroutine completes (either normally or via exception). Returns an awaiter that determines whether to suspend one last time or destroy the coroutine state immediately. This method must be `noexcept`.
+
+**`return_void()`** or **`return_value(v)`**: Called when the coroutine executes `co_return` or falls off the end of its body. Use `return_void()` if the coroutine does not return a value; use `return_value(v)` if it does. You must provide exactly one of these, matching how your coroutine returns.
+
+**`unhandled_exception()`**: Called if an exception escapes the coroutine body. Typically you either rethrow the exception, store it for later, or terminate the program.
+
+The compiler transforms your coroutine body into something resembling this pseudocode:
+
+```cpp
+{
+    promise_type promise;
+    auto return_object = promise.get_return_object();
+    
+    co_await promise.initial_suspend();
+    
+    try {
+        // your coroutine body goes here
+    }
+    catch (...) {
+        promise.unhandled_exception();
+    }
+    
+    co_await promise.final_suspend();
+}
+// coroutine frame is destroyed when control flows off the end
+```
+
+This transformation reveals important details. The return object is created before `initial_suspend()` runs, so it is available even if the coroutine suspends immediately. The `final_suspend()` determines whether the coroutine frame persists after completion—if it returns `suspend_always`, you must manually destroy the coroutine; if it returns `suspend_never`, the frame is destroyed automatically.
+
+Consider this example that demonstrates promise type behavior:
+
+```cpp
+#include <coroutine>
+#include <iostream>
+
+struct TracePromise {
+    struct promise_type {
+        promise_type() {
+            std::cout << "promise constructed" << std::endl;
+        }
+        ~promise_type() {
+            std::cout << "promise destroyed" << std::endl;
+        }
+        
+        TracePromise get_return_object() {
+            std::cout << "get_return_object called" << std::endl;
+            return {};
+        }
+        std::suspend_never initial_suspend() {
+            std::cout << "initial_suspend called" << std::endl;
+            return {};
+        }
+        std::suspend_always final_suspend() noexcept {
+            std::cout << "final_suspend called" << std::endl;
+            return {};
+        }
+        void return_void() {
+            std::cout << "return_void called" << std::endl;
+        }
+        void unhandled_exception() {
+            std::cout << "unhandled_exception called" << std::endl;
+        }
+    };
+    
+    std::coroutine_handle<promise_type> handle;
+};
+
+TracePromise trace_coroutine()
+{
+    std::cout << "coroutine body begins" << std::endl;
+    co_return;
+}
+
+int main()
+{
+    std::cout << "calling coroutine" << std::endl;
+    auto result = trace_coroutine();
+    std::cout << "coroutine returned" << std::endl;
+}
+```
+
+**Output:**
+
+```
+calling coroutine
+promise constructed
+get_return_object called
+initial_suspend called
+coroutine body begins
+return_void called
+final_suspend called
+coroutine returned
+```
+
+Notice that the promise is constructed first, then `get_return_object()` creates the return value, then `initial_suspend()` runs. Since `initial_suspend()` returns `suspend_never`, the coroutine body executes immediately. After `co_return`, `return_void()` is called, followed by `final_suspend()`. Since `final_suspend()` returns `suspend_always`, the coroutine suspends one last time, and the promise is not destroyed until the coroutine handle is explicitly destroyed.
+
+One important warning: if your coroutine can fall off the end of its body without executing `co_return`, and your promise type lacks a `return_void()` method, the behavior is undefined. This is a dangerous pitfall. Always ensure your promise type has `return_void()` if there is any code path that might reach the end of the coroutine body without an explicit `co_return`.
+
+You have now learned how the promise type controls coroutine behavior. The methods on the promise type let you customize initialization, suspension, value delivery, and cleanup.
+
+## Step 5 — Building a Generator with co_yield
+
+One of the most common uses for coroutines is building *generators*—functions that produce a sequence of values on demand. Instead of computing all values upfront and storing them in a container, a generator computes each value when requested.
+
+The `co_yield` keyword makes this pattern elegant. When a coroutine executes `co_yield value`, it delivers the value to its caller and suspends. The next time the coroutine resumes, it continues from just after the `co_yield`.
+
+Here is how `co_yield` works internally. The expression `co_yield value` is transformed by the compiler into:
+
+```cpp
+co_await promise.yield_value(value)
+```
+
+The `yield_value` method is a new method you must add to your promise type. It receives the yielded value, typically stores it somewhere accessible, and returns an awaiter (usually `std::suspend_always`) to suspend the coroutine.
+
+Here is a complete generator example:
+
+```cpp
+#include <coroutine>
+#include <iostream>
+
+struct Generator {
+    struct promise_type {
+        int current_value;
+        
+        Generator get_return_object() {
+            return Generator{
+                std::coroutine_handle<promise_type>::from_promise(*this)
+            };
+        }
+        std::suspend_always initial_suspend() { return {}; }
+        std::suspend_always final_suspend() noexcept { return {}; }
+        std::suspend_always yield_value(int value) {
+            current_value = value;
+            return {};
+        }
+        void return_void() {}
+        void unhandled_exception() { std::terminate(); }
+    };
+    
+    std::coroutine_handle<promise_type> handle;
+    
+    Generator(std::coroutine_handle<promise_type> h) : handle(h) {}
+    ~Generator() { if (handle) handle.destroy(); }
+    
+    // Disable copying
+    Generator(const Generator&) = delete;
+    Generator& operator=(const Generator&) = delete;
+    
+    // Enable moving
+    Generator(Generator&& other) noexcept 
+        : handle(other.handle) { other.handle = nullptr; }
+    Generator& operator=(Generator&& other) noexcept {
+        if (this != &other) {
+            if (handle) handle.destroy();
+            handle = other.handle;
+            other.handle = nullptr;
+        }
+        return *this;
+    }
+    
+    bool next() {
+        if (!handle || handle.done())
+            return false;
+        handle.resume();
+        return !handle.done();
+    }
+    
+    int value() const {
+        return handle.promise().current_value;
+    }
+};
+
+Generator count_to(int n)
+{
+    for (int i = 1; i <= n; ++i) {
+        co_yield i;
+    }
+}
+
+int main()
+{
+    auto gen = count_to(5);
+    
+    while (gen.next()) {
+        std::cout << gen.value() << std::endl;
+    }
+}
+```
+
+**Output:**
+
+```
+1
+2
+3
+4
+5
+```
+
+Study the key parts of this example:
+
+The `yield_value` method stores the yielded value in `current_value` and returns `suspend_always` to pause the coroutine after each yield.
+
+The `initial_suspend` returns `suspend_always`, which means the coroutine suspends before executing any of its body. This is important—it means the first call to `next()` is what starts the coroutine running.
+
+The `get_return_object` method creates the Generator object and stores a handle to the coroutine. Notice the expression `std::coroutine_handle<promise_type>::from_promise(*this)`. This static method creates a coroutine handle from a reference to the promise object. Since the promise object lives inside the coroutine frame at a known offset, this conversion is possible.
+
+The Generator class manages the coroutine handle's lifetime. The destructor calls `handle.destroy()` to free the coroutine frame. The class disables copying (copying handles would be problematic) but enables moving.
+
+The `next()` method resumes the coroutine and returns `true` if the coroutine produced a value, or `false` if the coroutine has completed. The `value()` method retrieves the most recently yielded value from the promise.
+
+Here is a more interesting generator that produces the Fibonacci sequence:
+
+```cpp
+Generator fibonacci()
+{
+    int a = 0, b = 1;
+    while (true) {
+        co_yield a;
+        int next = a + b;
+        a = b;
+        b = next;
+    }
+}
+
+int main()
+{
+    auto fib = fibonacci();
+    
+    for (int i = 0; i < 10 && fib.next(); ++i) {
+        std::cout << fib.value() << " ";
+    }
+    std::cout << std::endl;
+}
+```
+
+**Output:**
+
+```
+0 1 1 2 3 5 8 13 21 34 
+```
+
+The Fibonacci generator runs an infinite loop internally. It will produce values forever. But because it yields and suspends after each value, the caller controls when (and whether) to ask for more values. The generator only computes values on demand.
+
+This is the power of generators. The variables `a` and `b` persist across yields because they live in the coroutine frame on the heap. Each call to `next()` resumes the coroutine, which computes the next Fibonacci number, yields it, and suspends again.
+
+You have now built a working generator using `co_yield`. The promise type's `yield_value` method receives yielded values, and the Generator class provides an interface for retrieving them.
+
+## Step 6 — Understanding Return Objects and Coroutine Handles
+
+You have seen coroutine handles and return objects in previous examples. Now you will examine them more closely to understand their relationship and how information flows between them.
+
+A *coroutine handle* (`std::coroutine_handle<>`) is a lightweight object that refers to a suspended coroutine. It is similar to a pointer: it does not own the memory it references, and copying it does not copy the coroutine. You can resume the coroutine by calling the handle (using `handle()` or `handle.resume()`), query whether the coroutine has completed with `handle.done()`, and destroy the coroutine frame with `handle.destroy()`.
+
+The coroutine handle is a template. `std::coroutine_handle<>` (equivalent to `std::coroutine_handle<void>`) is the most basic form—it can reference any coroutine but provides no access to the promise object. `std::coroutine_handle<PromiseType>` is a more specific form that knows about a particular promise type. This typed handle can be converted to the void handle, and it provides a `promise()` method that returns a reference to the promise object.
+
+The *return object* is what the caller receives when calling a coroutine. It is the type that appears in the coroutine's declaration. When you write:
+
+```cpp
+Generator my_coroutine() {
+    co_yield 42;
+}
+```
+
+The return type is `Generator`, and when you call `my_coroutine()`, you receive a `Generator` object.
+
+The return object is created by calling `promise.get_return_object()` before the coroutine body begins. This happens early in the coroutine's lifecycle, giving the return object a chance to capture the coroutine handle. Here is the sequence:
+
+1. The coroutine frame is allocated on the heap.
+2. The promise object is constructed inside the frame.
+3. `promise.get_return_object()` is called, creating the return object.
+4. `co_await promise.initial_suspend()` executes.
+5. The coroutine body begins (if `initial_suspend` did not suspend).
+6. The return object is given to the caller.
+
+The key insight is that `get_return_object()` runs before `initial_suspend()`. This means:
+
+- If `initial_suspend()` returns `suspend_always`, the coroutine suspends before any user code runs, but the return object already exists and contains the coroutine handle.
+- If `initial_suspend()` returns `suspend_never`, the coroutine runs immediately, and the return object is still created first.
+
+Inside `get_return_object()`, you can obtain the coroutine handle using the static method `coroutine_handle::from_promise(*this)`. Since `get_return_object()` is called on the promise object (as `this`), this method returns a handle to the coroutine containing that promise.
+
+Here is an example that demonstrates the relationship:
+
+```cpp
+#include <coroutine>
+#include <iostream>
+
+struct Task {
+    struct promise_type {
+        Task get_return_object() {
+            std::cout << "Creating return object" << std::endl;
+            return Task{
+                std::coroutine_handle<promise_type>::from_promise(*this)
+            };
+        }
+        std::suspend_always initial_suspend() {
+            std::cout << "Initial suspend" << std::endl;
+            return {};
+        }
+        std::suspend_always final_suspend() noexcept {
+            std::cout << "Final suspend" << std::endl;
+            return {};
+        }
+        void return_void() {}
+        void unhandled_exception() {}
+    };
+    
+    std::coroutine_handle<promise_type> handle;
+    
+    Task(std::coroutine_handle<promise_type> h) : handle(h) {}
+    ~Task() { if (handle) handle.destroy(); }
+    
+    Task(Task&& other) noexcept : handle(other.handle) {
+        other.handle = nullptr;
+    }
+    
+    void resume() { handle.resume(); }
+    bool done() const { return handle.done(); }
+};
+
+Task example_task()
+{
+    std::cout << "Task body: part 1" << std::endl;
+    co_await std::suspend_always{};
+    std::cout << "Task body: part 2" << std::endl;
+}
+
+int main()
+{
+    std::cout << "Before calling coroutine" << std::endl;
+    
+    Task task = example_task();
+    
+    std::cout << "After calling coroutine, before first resume" << std::endl;
+    task.resume();
+    
+    std::cout << "After first resume, before second resume" << std::endl;
+    task.resume();
+    
+    std::cout << "After second resume" << std::endl;
+}
+```
+
+**Output:**
+
+```
+Before calling coroutine
+Creating return object
+Initial suspend
+After calling coroutine, before first resume
+Task body: part 1
+After first resume, before second resume
+Task body: part 2
+Final suspend
+After second resume
+```
+
+Follow the execution flow:
+
+1. Before `example_task()` is called, nothing has happened.
+2. Calling `example_task()` creates the coroutine frame, constructs the promise, and calls `get_return_object()`.
+3. The return object (Task) is created with a handle to the coroutine.
+4. `initial_suspend()` runs and returns `suspend_always`, so the coroutine suspends immediately.
+5. Control returns to `main`, which now holds the Task object.
+6. The first `resume()` runs "Task body: part 1", then hits `co_await suspend_always{}` and suspends.
+7. The second `resume()` runs "Task body: part 2", then falls off the end, triggering `final_suspend()`.
+8. Since `final_suspend()` returns `suspend_always`, the coroutine suspends one final time.
+9. When Task's destructor runs (at the end of main), it destroys the coroutine handle.
+
+The return object provides an interface to the caller. It hides the details of coroutine handles and promises behind whatever API makes sense for your use case. For a generator, the return object provides methods like `next()` and `value()`. For a task, it might provide `resume()` and `done()`. The return object owns the coroutine handle and is responsible for destroying it.
+
+You have now seen how return objects and coroutine handles work together. The return object is the caller's view of the coroutine, while the handle is the mechanism for resuming and managing the coroutine's lifetime.
+
+## Step 7 — Completing Coroutines with co_return
+
+You have seen coroutines that yield sequences of values and suspend indefinitely. Now you will learn how coroutines complete their execution using `co_return`.
+
+A coroutine completes in one of three ways:
+
+1. It executes `co_return;` (returning void)
+2. It executes `co_return expression;` (returning a value)
+3. Execution falls off the end of the coroutine body
+
+For case 1 and 3, the compiler calls `promise.return_void()`. For case 2, the compiler calls `promise.return_value(expression)`. You must provide exactly one of these methods in your promise type, matching how your coroutine returns.
+
+When a coroutine completes (by any of these means), it then executes `co_await promise.final_suspend()`. The awaiter returned by `final_suspend()` determines what happens next:
+
+- If it suspends (like `suspend_always`), the coroutine frame remains valid. The caller can still access the promise object and must eventually call `handle.destroy()` to free the memory.
+- If it does not suspend (like `suspend_never`), the coroutine frame is destroyed automatically. Any handles to the coroutine become dangling pointers.
+
+The choice between these behaviors matters. If your caller needs to access the result stored in the promise after the coroutine completes, use `suspend_always`. If the coroutine's completion signals some external mechanism (like releasing a semaphore) and the result is not needed, you might use `suspend_never` to avoid manual cleanup.
+
+Here is an example of a coroutine that returns a value:
+
+```cpp
+#include <coroutine>
+#include <iostream>
+#include <optional>
+
+struct ComputeResult {
+    struct promise_type {
+        std::optional<int> result;
+        
+        ComputeResult get_return_object() {
+            return ComputeResult{
+                std::coroutine_handle<promise_type>::from_promise(*this)
+            };
+        }
+        std::suspend_always initial_suspend() { return {}; }
+        std::suspend_always final_suspend() noexcept { return {}; }
+        void return_value(int value) {
+            result = value;
+        }
+        void unhandled_exception() {
+            result = std::nullopt;
+        }
+    };
+    
+    std::coroutine_handle<promise_type> handle;
+    
+    ComputeResult(std::coroutine_handle<promise_type> h) : handle(h) {}
+    ~ComputeResult() { if (handle) handle.destroy(); }
+    
+    ComputeResult(ComputeResult&& other) noexcept : handle(other.handle) {
+        other.handle = nullptr;
+    }
+    
+    void run() {
+        while (!handle.done()) {
+            handle.resume();
+        }
+    }
+    
+    std::optional<int> get_result() const {
+        return handle.promise().result;
+    }
+};
+
+ComputeResult compute_sum(int n)
+{
+    int sum = 0;
+    for (int i = 1; i <= n; ++i) {
+        sum += i;
+        co_await std::suspend_always{};  // yield control periodically
+    }
+    co_return sum;
+}
+
+int main()
+{
+    auto computation = compute_sum(5);
+    computation.run();
+    
+    if (auto result = computation.get_result()) {
+        std::cout << "Result: " << *result << std::endl;
+    }
+}
+```
+
+**Output:**
+
+```
+Result: 15
+```
+
+The `compute_sum` coroutine adds numbers from 1 to n, periodically yielding control with `co_await suspend_always{}`. When the loop completes, it executes `co_return sum`, which calls `promise.return_value(sum)`, storing the result in the promise.
+
+Because `final_suspend()` returns `suspend_always`, the coroutine frame remains valid after completion. The `get_result()` method can access `handle.promise().result` to retrieve the computed value.
+
+You can query whether a coroutine has completed using `handle.done()`. This method returns `true` after the coroutine has executed `co_return` (or fallen off the end) and completed the `final_suspend` awaiter. Do not confuse `handle.done()` with `handle.operator bool()`. The boolean conversion only checks if the handle is non-null; it does not indicate completion.
+
+A critical warning about undefined behavior: if your coroutine can fall off the end of its body and your promise type does not have a `return_void()` method, the behavior is undefined. This is dangerous because the compiler may not warn you. Always ensure your promise type has `return_void()` if any code path might reach the end of the coroutine without an explicit `co_return`.
+
+Here is the same computation rewritten to fall off the end instead of using explicit `co_return`:
+
+```cpp
+struct ComputeResult2 {
+    struct promise_type {
+        int result = 0;
+        
+        ComputeResult2 get_return_object() {
+            return ComputeResult2{
+                std::coroutine_handle<promise_type>::from_promise(*this)
+            };
+        }
+        std::suspend_always initial_suspend() { return {}; }
+        std::suspend_always final_suspend() noexcept { return {}; }
+        void return_void() {}  // Required because we fall off the end
+        void unhandled_exception() {}
+    };
+    
+    std::coroutine_handle<promise_type> handle;
+    // ... rest of the class
+};
+
+ComputeResult2 compute_sum2(int n)
+{
+    auto& result = co_await GetPromiseAwaiter{};  // hypothetical
+    int sum = 0;
+    for (int i = 1; i <= n; ++i) {
+        sum += i;
+        co_await std::suspend_always{};
+    }
+    result = sum;
+    // Falls off the end - calls promise.return_void()
+}
+```
+
+In this version, we store the result in the promise before falling off the end. The `return_void()` method must exist even though it does nothing, because the coroutine reaches the end of its body.
+
+You have now learned how coroutines complete execution. The `co_return` statement (or falling off the end) triggers the promise's return methods, and `final_suspend` determines whether the coroutine frame persists.
+
+## Step 8 — Building a Generic Generator
+
+You have learned all the pieces needed to build a reusable generator type. In this step, you will assemble them into a template class that works with any value type.
+
+A production-quality generator needs to handle several concerns:
+
+1. Store and retrieve yielded values of any type
+2. Manage the coroutine handle's lifetime correctly
+3. Propagate exceptions from the coroutine to the caller
+4. Provide a clean iteration interface
+
+Here is a complete generic generator:
+
+```cpp
+#include <coroutine>
+#include <exception>
+#include <utility>
+
+template<typename T>
+class Generator {
+public:
+    struct promise_type {
+        T value;
+        std::exception_ptr exception;
+        
+        Generator get_return_object() {
+            return Generator{Handle::from_promise(*this)};
+        }
+        
+        std::suspend_always initial_suspend() noexcept {
+            return {};
+        }
+        
+        std::suspend_always final_suspend() noexcept {
+            return {};
+        }
+        
+        std::suspend_always yield_value(T v) {
+            value = std::move(v);
+            return {};
+        }
+        
+        void return_void() noexcept {}
+        
+        void unhandled_exception() {
+            exception = std::current_exception();
+        }
+        
+        template<typename U>
+        std::suspend_never await_transform(U&&) = delete;
+    };
+    
+    using Handle = std::coroutine_handle<promise_type>;
+    
+private:
+    Handle handle_;
+    
+public:
+    explicit Generator(Handle h) : handle_(h) {}
+    
+    ~Generator() {
+        if (handle_) {
+            handle_.destroy();
+        }
+    }
+    
+    Generator(const Generator&) = delete;
+    Generator& operator=(const Generator&) = delete;
+    
+    Generator(Generator&& other) noexcept
+        : handle_(std::exchange(other.handle_, nullptr)) {}
+    
+    Generator& operator=(Generator&& other) noexcept {
+        if (this != &other) {
+            if (handle_) {
+                handle_.destroy();
+            }
+            handle_ = std::exchange(other.handle_, nullptr);
+        }
+        return *this;
+    }
+    
+    class iterator {
+        Handle handle_;
+        
+    public:
+        using iterator_category = std::input_iterator_tag;
+        using value_type = T;
+        using difference_type = std::ptrdiff_t;
+        using pointer = T*;
+        using reference = T&;
+        
+        iterator() : handle_(nullptr) {}
+        explicit iterator(Handle h) : handle_(h) {}
+        
+        iterator& operator++() {
+            handle_.resume();
+            if (handle_.done()) {
+                auto& promise = handle_.promise();
+                handle_ = nullptr;
+                if (promise.exception) {
+                    std::rethrow_exception(promise.exception);
+                }
+            }
+            return *this;
+        }
+        
+        iterator operator++(int) {
+            iterator temp = *this;
+            ++(*this);
+            return temp;
+        }
+        
+        T& operator*() const {
+            return handle_.promise().value;
+        }
+        
+        T* operator->() const {
+            return &handle_.promise().value;
+        }
+        
+        bool operator==(const iterator& other) const {
+            return handle_ == other.handle_;
+        }
+        
+        bool operator!=(const iterator& other) const {
+            return !(*this == other);
+        }
+    };
+    
+    iterator begin() {
+        if (handle_) {
+            handle_.resume();
+            if (handle_.done()) {
+                auto& promise = handle_.promise();
+                if (promise.exception) {
+                    std::rethrow_exception(promise.exception);
+                }
+                return iterator{};
+            }
+        }
+        return iterator{handle_};
+    }
+    
+    iterator end() {
+        return iterator{};
+    }
+};
+```
+
+This generator provides a standard iterator interface, allowing use in range-based for loops:
+
+```cpp
+Generator<int> range(int start, int end)
+{
+    for (int i = start; i < end; ++i) {
+        co_yield i;
+    }
+}
+
+Generator<int> squares(int n)
+{
+    for (int i = 0; i < n; ++i) {
+        co_yield i * i;
+    }
+}
+
+int main()
+{
+    std::cout << "Range 1 to 5:" << std::endl;
+    for (int x : range(1, 6)) {
+        std::cout << x << " ";
+    }
+    std::cout << std::endl;
+    
+    std::cout << "First 5 squares:" << std::endl;
+    for (int x : squares(5)) {
+        std::cout << x << " ";
+    }
+    std::cout << std::endl;
+}
+```
+
+**Output:**
+
+```
+Range 1 to 5:
+1 2 3 4 5 
+First 5 squares:
+0 1 4 9 16 
+```
+
+Several design choices in this generator deserve explanation:
+
+**`initial_suspend()` returns `suspend_always`**: The coroutine suspends before running any user code. This means `begin()` must resume the coroutine to get the first value. This design prevents work from being done if the generator is never iterated.
+
+**`final_suspend()` returns `suspend_always`**: The coroutine frame persists after completion. This is necessary because the iterator needs to check `handle_.done()` and potentially access the exception stored in the promise. If `final_suspend()` returned `suspend_never`, the handle would become invalid before these checks could occur.
+
+**Exception handling**: The `unhandled_exception()` method stores the current exception in the promise using `std::current_exception()`. The iterator's `operator++` and `begin()` check for this exception and rethrow it using `std::rethrow_exception()`. This propagates exceptions from the coroutine to the calling code.
+
+**`await_transform` is deleted**: This prevents using `co_await` inside the generator. A generator should only yield values, not await other operations. Deleting `await_transform` makes any use of `co_await` inside a `Generator<T>` coroutine a compile error.
+
+**Move semantics**: The generator is movable but not copyable. Copying a coroutine handle would create aliasing problems—both copies would refer to the same coroutine frame, and destroying one would invalidate the other. Moving transfers ownership cleanly.
+
+Here is an example demonstrating exception propagation:
+
+```cpp
+Generator<int> may_throw(bool should_throw)
+{
+    co_yield 1;
+    co_yield 2;
+    if (should_throw) {
+        throw std::runtime_error("Generator error");
+    }
+    co_yield 3;
+}
+
+int main()
+{
+    try {
+        for (int x : may_throw(true)) {
+            std::cout << x << std::endl;
+        }
+    }
+    catch (const std::exception& e) {
+        std::cout << "Caught: " << e.what() << std::endl;
+    }
+}
+```
+
+**Output:**
+
+```
+1
+2
+Caught: Generator error
+```
+
+The exception thrown inside the generator propagates to the calling code and can be caught normally.
+
+You have now built a production-quality generic generator. It handles value types, manages coroutine lifetime, propagates exceptions, and provides a standard iterator interface.
+
+## Step 9 — Handling Exceptions in Coroutines
+
+Exceptions in coroutines require special attention. Because a coroutine can suspend and resume across different call stacks, the normal exception propagation mechanism does not work directly. The promise type's `unhandled_exception()` method provides the hook for handling exceptions that escape the coroutine body.
+
+When an exception is thrown inside a coroutine and not caught within the coroutine, the following happens:
+
+1. The exception is caught by the implicit try-catch block surrounding the coroutine body.
+2. `promise.unhandled_exception()` is called while the exception is still active.
+3. After `unhandled_exception()` returns, `co_await promise.final_suspend()` executes.
+4. The coroutine completes (either suspended or destroyed, depending on `final_suspend`).
+
+Inside `unhandled_exception()`, you have several options:
+
+**Terminate the program**: Call `std::terminate()`. This is the safest option if you cannot handle exceptions.
+
+```cpp
+void unhandled_exception() {
+    std::terminate();
+}
+```
+
+**Store the exception for later**: Use `std::current_exception()` to capture the exception and store it in the promise. The caller can later check for the exception and rethrow it.
+
+```cpp
+void unhandled_exception() {
+    exception_ = std::current_exception();
+}
+```
+
+**Rethrow the exception**: Call `throw;` to rethrow the exception. This propagates the exception to whoever is currently running the coroutine, but be careful—this may not be the original caller if the coroutine has been resumed from a different context.
+
+```cpp
+void unhandled_exception() {
+    throw;
+}
+```
+
+**Swallow the exception**: Do nothing. This silences the exception, which is almost always a mistake but might be appropriate in specific circumstances.
+
+```cpp
+void unhandled_exception() {
+    // Exception is silently ignored
+}
+```
+
+The stored exception pattern is most useful for generators and tasks where the caller expects to receive results:
+
+```cpp
+#include <coroutine>
+#include <exception>
+#include <iostream>
+#include <stdexcept>
+
+struct Task {
+    struct promise_type {
+        std::exception_ptr exception;
+        
+        Task get_return_object() {
+            return Task{std::coroutine_handle<promise_type>::from_promise(*this)};
+        }
+        std::suspend_always initial_suspend() { return {}; }
+        std::suspend_always final_suspend() noexcept { return {}; }
+        void return_void() {}
+        void unhandled_exception() {
+            exception = std::current_exception();
+        }
+    };
+    
+    std::coroutine_handle<promise_type> handle;
+    
+    Task(std::coroutine_handle<promise_type> h) : handle(h) {}
+    ~Task() { if (handle) handle.destroy(); }
+    
+    void run() {
+        handle.resume();
+    }
+    
+    void check_exception() {
+        if (handle.promise().exception) {
+            std::rethrow_exception(handle.promise().exception);
+        }
+    }
+};
+
+Task risky_operation()
+{
+    std::cout << "Starting risky operation" << std::endl;
+    throw std::runtime_error("Something went wrong");
+    co_return;  // Never reached
+}
+
+int main()
+{
+    Task task = risky_operation();
+    
+    try {
+        task.run();
+        task.check_exception();
+        std::cout << "Operation completed successfully" << std::endl;
+    }
+    catch (const std::exception& e) {
+        std::cout << "Operation failed: " << e.what() << std::endl;
+    }
+}
+```
+
+**Output:**
+
+```
+Starting risky operation
+Operation failed: Something went wrong
+```
+
+The timing of when to check for exceptions matters. In this example, `check_exception()` is called after `run()` completes. If the coroutine suspended multiple times, you might want to check for exceptions after each resumption.
+
+For generators with iterators, exceptions are typically checked during iteration:
+
+```cpp
+iterator& operator++() {
+    handle_.resume();
+    if (handle_.done()) {
+        auto& promise = handle_.promise();
+        if (promise.exception) {
+            std::rethrow_exception(promise.exception);
+        }
+    }
+    return *this;
+}
+```
+
+This ensures that exceptions are propagated to the code iterating over the generator.
+
+Be aware of exception safety during coroutine initialization. If an exception is thrown before the first suspension point (and before `initial_suspend` completes), the exception propagates directly to the caller without going through `unhandled_exception()`. If `initial_suspend()` returns `suspend_always`, the coroutine suspends before any user code runs, avoiding this issue.
+
+You have now learned how to handle exceptions in coroutines. The `unhandled_exception()` method provides a hook for capturing or propagating exceptions, and the stored exception pattern allows callers to receive exceptions even when the coroutine has suspended and resumed.
+
+## Step 10 — Practical Patterns and Applications
+
+You have learned the mechanics of C++20 coroutines. Now you will explore practical patterns that demonstrate their power.
+
+### Lazy Sequences
+
+Generators excel at producing lazy sequences—sequences where values are computed only when needed. This pattern is useful when working with infinite sequences or when computing values is expensive.
+
+```cpp
+Generator<int> infinite_counter()
+{
+    int i = 0;
+    while (true) {
+        co_yield i++;
+    }
+}
+
+Generator<int> primes()
+{
+    auto is_prime = [](int n) {
+        if (n < 2) return false;
+        if (n == 2) return true;
+        if (n % 2 == 0) return false;
+        for (int i = 3; i * i <= n; i += 2) {
+            if (n % i == 0) return false;
+        }
+        return true;
+    };
+    
+    int n = 2;
+    while (true) {
+        if (is_prime(n)) {
+            co_yield n;
+        }
+        ++n;
+    }
+}
+
+int main()
+{
+    int count = 0;
+    for (int p : primes()) {
+        std::cout << p << " ";
+        if (++count >= 10) break;
+    }
+    std::cout << std::endl;
+}
+```
+
+**Output:**
+
+```
+2 3 5 7 11 13 17 19 23 29 
+```
+
+The prime generator tests each number for primality but only computes values as they are requested. An infinite number of primes exist, but the program only computes the first ten.
+
+### Transforming Sequences
+
+Generators can transform sequences from other generators, creating a pipeline of operations:
+
+```cpp
+Generator<int> take(Generator<int> source, int n)
+{
+    int count = 0;
+    for (int value : source) {
+        if (count++ >= n) break;
+        co_yield value;
+    }
+}
+
+Generator<int> filter(Generator<int> source, bool (*predicate)(int))
+{
+    for (int value : source) {
+        if (predicate(value)) {
+            co_yield value;
+        }
+    }
+}
+
+Generator<int> transform(Generator<int> source, int (*func)(int))
+{
+    for (int value : source) {
+        co_yield func(value);
+    }
+}
+
+bool is_even(int n) { return n % 2 == 0; }
+int square(int n) { return n * n; }
+
+int main()
+{
+    // Take first 5 even numbers from range, then square them
+    auto pipeline = transform(
+        filter(
+            take(range(1, 100), 10),
+            is_even
+        ),
+        square
+    );
+    
+    for (int x : pipeline) {
+        std::cout << x << " ";
+    }
+    std::cout << std::endl;
+}
+```
+
+**Output:**
+
+```
+4 16 36 64 100 
+```
+
+Each generator in the pipeline produces values on demand. The `filter` generator only requests the next value from its source when it needs to produce an output. The `transform` generator only transforms values as they pass through.
+
+### Tree Traversal
+
+Ana Lúcia de Moura and Roberto Ierusalimschy, in their influential paper on coroutines, demonstrated tree traversal as a classic use case. With generators, you can traverse a tree structure while maintaining the simple recursive algorithm:
+
+```cpp
+struct TreeNode {
+    int value;
+    TreeNode* left;
+    TreeNode* right;
+    
+    TreeNode(int v, TreeNode* l = nullptr, TreeNode* r = nullptr)
+        : value(v), left(l), right(r) {}
+};
+
+Generator<int> inorder(TreeNode* node)
+{
+    if (node == nullptr) {
+        co_return;
+    }
+    
+    for (int v : inorder(node->left)) {
+        co_yield v;
+    }
+    
+    co_yield node->value;
+    
+    for (int v : inorder(node->right)) {
+        co_yield v;
+    }
+}
+
+int main()
+{
+    //       4
+    //      / \
+    //     2   6
+    //    / \ / \
+    //   1  3 5  7
+    
+    TreeNode n1(1), n3(3), n5(5), n7(7);
+    TreeNode n2(2, &n1, &n3), n6(6, &n5, &n7);
+    TreeNode root(4, &n2, &n6);
+    
+    for (int v : inorder(&root)) {
+        std::cout << v << " ";
+    }
+    std::cout << std::endl;
+}
+```
+
+**Output:**
+
+```
+1 2 3 4 5 6 7 
+```
+
+The recursive structure of the tree traversal matches the recursive structure of the code. Each call to `inorder` creates a new generator that yields values from its subtree. The `co_yield` in the loop forwards those values upward.
+
+### Cooperative Multitasking
+
+Coroutines enable cooperative multitasking without threads. Multiple tasks can make progress by voluntarily yielding control:
+
+```cpp
+#include <vector>
+#include <string>
+
+struct Task {
+    struct promise_type {
+        Task get_return_object() {
+            return Task{std::coroutine_handle<promise_type>::from_promise(*this)};
+        }
+        std::suspend_always initial_suspend() { return {}; }
+        std::suspend_always final_suspend() noexcept { return {}; }
+        void return_void() {}
+        void unhandled_exception() { std::terminate(); }
+    };
+    
+    std::coroutine_handle<promise_type> handle;
+    
+    Task(std::coroutine_handle<promise_type> h) : handle(h) {}
+    ~Task() { if (handle) handle.destroy(); }
+    
+    Task(Task&& other) noexcept : handle(other.handle) {
+        other.handle = nullptr;
+    }
+    
+    bool done() const { return handle.done(); }
+    void resume() { handle.resume(); }
+};
+
+struct Scheduler {
+    std::vector<Task> tasks;
+    
+    void add(Task task) {
+        tasks.push_back(std::move(task));
+    }
+    
+    void run() {
+        while (!tasks.empty()) {
+            for (size_t i = 0; i < tasks.size(); ) {
+                tasks[i].resume();
+                if (tasks[i].done()) {
+                    tasks.erase(tasks.begin() + i);
+                } else {
+                    ++i;
+                }
+            }
+        }
+    }
+};
+
+Task worker(std::string name, int iterations)
+{
+    for (int i = 0; i < iterations; ++i) {
+        std::cout << name << " iteration " << i << std::endl;
+        co_await std::suspend_always{};
+    }
+}
+
+int main()
+{
+    Scheduler scheduler;
+    scheduler.add(worker("Alice", 3));
+    scheduler.add(worker("Bob", 2));
+    scheduler.run();
+}
+```
+
+**Output:**
+
+```
+Alice iteration 0
+Bob iteration 0
+Alice iteration 1
+Bob iteration 1
+Alice iteration 2
+```
+
+The scheduler interleaves the execution of Alice and Bob. Each task runs until it hits `co_await suspend_always{}`, then yields control. The scheduler resumes the next task, achieving cooperative multitasking.
+
+This pattern can be extended with I/O operations. Instead of `suspend_always`, tasks would await I/O completions. A real scheduler would integrate with an event loop, resuming tasks when their I/O operations complete.
+
+You have now seen practical applications of C++20 coroutines. Lazy sequences, sequence transformations, tree traversal, and cooperative multitasking all benefit from coroutines' ability to suspend and resume execution while preserving local state.
+
+## Conclusion
+
+In this tutorial, you explored C++20 coroutines from fundamental concepts to practical implementations.
+
+You began by understanding the problem coroutines solve: the fragmentation of logic that occurs when writing asynchronous code with callbacks. Coroutines restore the natural flow of sequential code while maintaining asynchronous behavior.
+
+You learned to recognize coroutines by their keywords: `co_await` for suspension, `co_yield` for producing values, and `co_return` for completion. You discovered that the presence of any of these keywords transforms a function into a coroutine with special runtime behavior.
+
+You examined the mechanics of suspension and resumption, understanding how the coroutine frame preserves local variables on the heap while the coroutine is suspended. The `std::coroutine_handle` provides the mechanism for resuming a suspended coroutine.
+
+You studied the promise type, the controller class that customizes coroutine behavior. Its methods—`get_return_object`, `initial_suspend`, `final_suspend`, `yield_value`, `return_void`, `return_value`, and `unhandled_exception`—define how the coroutine initializes, suspends, produces values, completes, and handles errors.
+
+You built a complete generator type that produces sequences of values on demand. The generator manages coroutine lifetime, provides an iterator interface, and propagates exceptions from the coroutine to calling code.
+
+You explored practical patterns: lazy sequences that compute values only when needed, pipelines that transform sequences, tree traversals that maintain recursive structure, and cooperative multitasking that interleaves multiple tasks.
+
+C++20 coroutines provide a foundation for building sophisticated asynchronous systems. The standard library in C++23 and beyond will provide higher-level abstractions built on this foundation. Understanding the mechanisms described in this tutorial will help you use those abstractions effectively and build your own when needed.
+
+For further exploration, consider studying:
+
+- The `std::generator` type introduced in C++23
+- Asynchronous I/O frameworks that use coroutines
+- The senders and receivers model being developed for C++26
+- Real-world applications of coroutines in networking, databases, and user interfaces
+
+Coroutines represent a significant evolution in how C++ programmers can express complex control flow. The ability to write asynchronous code that reads like synchronous code, while maintaining full control over memory and performance, embodies the spirit of C++: abstraction without hidden costs.

From 117e2808b1329e6c757c1aeb91bff712fcfbdfdd Mon Sep 17 00:00:00 2001
From: Vinnie Falco <vinnie.falco@gmail.com>
Date: Sat, 31 Jan 2026 22:22:46 -0800
Subject: [PATCH 2/3] Add commands and rules for regenerating documentation

---
 .cursor/commands/doc-rebuild.md | 546 ++++++++++++++++++++++++++++++++
 .cursor/rules/writing-guide.mdc |  26 ++
 2 files changed, 572 insertions(+)
 create mode 100644 .cursor/commands/doc-rebuild.md

diff --git a/.cursor/commands/doc-rebuild.md b/.cursor/commands/doc-rebuild.md
new file mode 100644
index 00000000..4919f820
--- /dev/null
+++ b/.cursor/commands/doc-rebuild.md
@@ -0,0 +1,546 @@
+# Rebuild Capy Documentation
+
+Regenerate the Capy documentation following this structure and style guide.
+
+## Documentation Structure
+
+### 1. Introduction Page (`index.adoc`)
+
+**Opening paragraph** (verbatim, do not change):
+
+> Capy abstracts away sockets, files, and asynchrony with type-erased streams and buffer sequences—code compiles fast because the implementation is hidden. It provides the framework for concurrent algorithms that transact in buffers of memory: networking, serial ports, console, timers, and any platform I/O. This is only possible because Capy is coroutine-only, enabling optimizations and ergonomics that hybrid approaches must sacrifice.
+
+**Required sections** (in order):
+
+1. Title + Opening Paragraph (above)
+2. **What This Library Does** (verbatim):
+  - Lazy coroutine tasks — `task<T>` with forward-propagating stop tokens and automatic cancellation
+  - Buffer sequences — taken straight from Asio and improved
+  - Stream concepts — `ReadStream`, `WriteStream`, `ReadSource`, `WriteSink`, `BufferSource`, `BufferSink`
+  - Type-erased streams — `any_stream`, `any_read_stream`, `any_write_stream` for fast compilation
+  - Concurrency facilities — executors, strands, thread pools, `when_all`, `when_any`
+  - Test utilities — mock streams, mock sources/sinks, error injection
+3. **What This Library Does Not Do** (verbatim):
+  - Networking — no sockets, acceptors, or DNS; that's what Corosio provides
+  - Protocols — no HTTP, WebSocket, or TLS; see the Http and Beast2 libraries
+  - Platform event loops — no io_uring, IOCP, epoll, or kqueue; Capy is the layer above
+  - Callbacks or futures — coroutine-only means no other continuation styles
+  - Sender/receiver — Capy uses the IoAwaitable protocol, not `std::execution`
+4. **Target Audience** (verbatim):
+  - Users of Corosio — portable coroutine networking
+  - Users of Http — sans-I/O HTTP/1.1 clients and servers
+  - Users of Websocket — sans-I/O WebSocket
+  - Users of Beast2 — high-level HTTP/WebSocket servers
+  - Users of Burl — high-level HTTP client
+   All of these are built on Capy. Understanding its concepts—tasks, buffer sequences, streams, executors—unlocks the full power of the stack.
+5. **Design Philosophy** (verbatim):
+  - **Use case first.** Buffer sequences, stream concepts, executor affinity—these exist because I/O code needs them, not because they're theoretically elegant.
+  - **Coroutines-only.** No callbacks, futures, or sender/receiver. Hybrid support forces compromises; full commitment unlocks optimizations that adapted models cannot achieve.
+  - **Address the complaints of C++.** Type erasure at boundaries, minimal dependencies, and hidden implementations keep builds fast and templates manageable.
+6. **Requirements** (verbatim):
+  **Assumed Knowledge:**
+  - C++20 coroutines, concepts, and ranges
+  - Basic concurrent programming
+   **Compiler Support:**
+  - GCC 12+
+  - Clang 17+
+  - Apple-Clang (macOS 14+)
+  - MSVC 14.34+
+  - MinGW
+   **Dependencies:**
+  - None. Capy is self-contained and does not require Boost.
+   **Linking:**
+  - Capy is a compiled library. Link against `capy`.
+7. Code Convention Note - Callout with standard includes/namespaces
+8. Quick Example - Minimal working code
+9. Next Steps - Links to Quick Start, tutorials, reference
+
+### 2. C++20 Coroutines Tutorial (Second Section)
+
+**Source**: `doc/reference/coro-tutorial.md`
+
+**Instructions**: Insert this tutorial as the second major documentation section. Apply only:
+
+- **Resectioning**: Organize into logical parts matching the existing structure in agent-guide.md
+- **Pagination**: Split into multiple pages for reading pace
+
+**Target page structure** (from agent-guide.md):
+
+- Part I: Foundations (`cpp20-coroutines/foundations.adoc`)
+  - Functions and the Call Stack
+  - What Is a Coroutine?
+  - Why Coroutines?
+- Part II: C++20 Syntax (`cpp20-coroutines/syntax.adoc`)
+  - The Three Keywords
+  - Your First Coroutine
+  - Awaitables and Awaiters
+- Part III: Coroutine Machinery (`cpp20-coroutines/machinery.adoc`)
+  - The Promise Type
+  - Coroutine Handle
+  - Putting It Together
+- Part IV: Advanced Topics (`cpp20-coroutines/advanced.adoc`)
+  - Symmetric Transfer
+  - Coroutine Allocation
+  - Exception Handling
+
+**Mapping from tutorial steps**:
+
+- Steps 1-3 → Part I (Foundations)
+- Steps 2, 4-5 → Part II (Syntax)
+- Steps 4, 6-7 → Part III (Machinery)
+- Steps 8-10 → Part IV (Advanced)
+
+### 3. Concurrency Tutorial (Third Section)
+
+**Source**: `doc/reference/concurrency-2.md`
+
+**Instructions**: Insert this tutorial as the third major documentation section. Apply only:
+
+- **Resectioning**: Organize into logical parts for pacing
+- **Pagination**: Split into multiple pages
+
+**Target page structure**:
+
+- Part I: Foundations (`concurrency/foundations.adoc`)
+  - Why Concurrency Matters
+  - Threads—Your Program's Parallel Lives
+  - Creating Threads
+  - Thread Lifecycle
+- Part II: Synchronization (`concurrency/synchronization.adoc`)
+  - Race Conditions
+  - Mutexes
+  - Lock Guards—RAII
+  - Deadlocks
+- Part III: Advanced Primitives (`concurrency/advanced.adoc`)
+  - Atomics
+  - Condition Variables
+  - Shared Locks (Readers/Writers)
+- Part IV: Communication & Patterns (`concurrency/patterns.adoc`)
+  - Futures and Promises
+  - std::async
+  - Thread-Local Storage
+  - Practical Patterns (Producer-Consumer, Parallel For)
+
+**Mapping from tutorial parts**:
+
+- Parts 1-4 → Foundations
+- Parts 5-8 → Synchronization
+- Parts 9-11 → Advanced Primitives
+- Parts 12-16 → Communication & Patterns
+
+### 4. Coroutines in Capy (Fourth Section)
+
+This section transitions from general C++ knowledge to Capy-specific library usage. Generate content based on public API and agent-guide.md.
+
+**Target page structure** (`coroutines/`):
+
+- **The task Type** (`coroutines/tasks.adoc`)
+  - Declaring `task<T>` coroutines
+  - Returning values with `co_return`
+  - Awaiting other tasks
+  - Lazy execution and symmetric transfer
+- **Launching Coroutines** (`coroutines/launching.adoc`)
+  - `run_async` — entry point from non-coroutine code
+  - Two-call syntax and C++17 evaluation order
+  - `run` — executor hopping within coroutine code
+  - Handler overloads for results and exceptions
+  - The execution model: how tasks get scheduled and resumed
+- **Executors and Execution Contexts** (`coroutines/executors.adoc`)
+  - The `Executor` concept: `dispatch()` and `post()`
+  - `dispatch()` vs `post()` — inline execution vs always queue
+  - `executor_ref` — type-erased executor wrapper
+  - `thread_pool` — multi-threaded execution context
+  - `execution_context` — base class for custom contexts
+  - `strand` — serialization without mutexes
+  - Single-threaded vs multi-threaded patterns
+- **The IoAwaitable Protocol** (`coroutines/io-awaitable.adoc`)
+  - The three-argument `await_suspend` signature
+  - Forward context propagation vs backward queries
+  - `IoAwaitable`, `IoAwaitableTask`, `IoLaunchableTask` concepts
+  - Why affinity matters for I/O
+- **Stop Tokens and Cancellation** (`coroutines/cancellation.adoc`)
+  **Teach from the ground up for complete beginners:**
+  *Part 1: The Problem*
+  - Why cancellation matters: user hits "Cancel", timeout expires, connection drops
+  - The naive approach: boolean flags — why they don't work (races, no standardization)
+  - The thread interruption problem: forceful termination corrupts state
+  - The goal: cooperative cancellation — ask nicely, let the work clean up
+  *Part 2: C++20 Stop Tokens — A General-Purpose Signaling Mechanism*
+  **Key insight**: `stop_token` is not merely a cancellation primitive—it implements the Observer pattern, a thread-safe one-to-many notification system. The "stop" naming obscures its generality.
+  **The three components**:
+  - `std::stop_source` — the **Subject/Publisher**: owns shared state, triggers notifications
+  - `std::stop_token` — the **Subscriber View**: read-only, copyable, cheap to pass around
+  - `std::stop_callback<F>` — the **Observer Registration**: RAII callback that runs when signaled
+  **How they work together**:
+  - Source creates tokens via `get_token()`
+  - Multiple tokens can share the same state (distribute notification capability)
+  - Callbacks register interest; destruction unregisters automatically
+  - When `request_stop()` is called, all registered callbacks are invoked
+  - **Immediate invocation**: if already signaled, callback runs in constructor
+  - Thread-safe: registration and invocation are safe from any thread
+  **Type-erased polymorphic observers**:
+  - Each `stop_callback<F>` stores a different callable type `F`
+  - No virtual functions, no heap allocation per callback
+  - Equivalent to `vector<function<void()>>` but with RAII lifetime management
+  **The one-shot nature** (BIG WARNING):
+  - Can only transition from "not signaled" to "signaled" once
+  - No reset mechanism — once `stop_requested()` returns true, it stays true forever
+  - `request_stop()` returns `true` only on the first successful call
+  - **NOT REUSABLE**: You cannot "un-cancel" a stop_source
+  **How to "reset" (workaround)**:
+  - Create a new `stop_source` (assigns fresh shared state)
+  - Call `get_token()` on the new source
+  - Distribute the new token to all components that need it
+  - Components must replace their old `stop_token` with the new one
+  - This is manual and error-prone — design your system to avoid needing resets
+  **Example of the reset pattern**:
+  ```cpp
+  std::stop_source source;
+  // ... distribute source.get_token() to workers ...
+  source.request_stop();  // triggered, now permanently signaled
+
+  // To "reset": create entirely new source
+  source = std::stop_source{};  // new shared state
+  // Must redistribute new tokens to ALL holders of the old token
+  // Old tokens are now orphaned (stop_possible() returns false)
+  ```
+  **Design implication**: If you need repeatable signals, stop_token is the wrong tool. Use condition variables, atomic flags with explicit protocol, or wait for a future resettable signal facility.
+  **Beyond cancellation** (the naming hides this):
+  - Starting things: "ready" signal triggers initialization
+  - Configuration loaded: notify components when config is available
+  - Resource availability: signal when database connected, cache warmed
+  - Any one-shot broadcast notification scenario
+  *Part 3: Stop Tokens in Coroutines*
+  - The propagation problem: how does a nested coroutine know to stop?
+  - Capy's answer: stop tokens flow downward through `co_await`
+  - `get_stop_token()` — retrieve the current stop token inside a task
+  - Automatic propagation: child tasks inherit parent's stop token
+  - No manual threading: the IoAwaitable protocol handles it
+  *Part 4: Responding to Cancellation*
+  - Checking: `if (token.stop_requested()) co_return;`
+  - Cleanup: RAII ensures resources are released on early exit
+  - Partial results: returning what you have vs throwing
+  - The `operation_aborted` error code convention
+  *Part 5: OS Integration*
+  - How stop tokens connect to platform I/O (IOCP, io_uring)
+  - Cancelling a pending read/write at the OS level
+  - Immediate response vs next-operation-fails
+  *Part 6: Patterns*
+  - Timeout pattern: `stop_source` + timer → cancel after N seconds
+  - User cancellation: UI button triggers `stop_source.request_stop()`
+  - Graceful shutdown: cancel all pending work, wait for cleanup
+  - `when_any` cancellation: first-to-finish cancels siblings
+- **Concurrent Composition** (`coroutines/composition.adoc`)
+  - `when_all` — run tasks in parallel, wait for all to complete
+  - Result tuple and void filtering
+  - `when_any` — run tasks in parallel, return when first completes
+  - Stop propagation across siblings — cancelling the losers
+  - Error handling: which exceptions propagate?
+- **Frame Allocators** (`coroutines/allocators.adoc`)
+  - The timing constraint: `operator new` before coroutine body
+  - Thread-local propagation and "the window"
+  - The `FrameAllocator` concept
+  - HALO optimization support
+
+**Reference headers**:
+
+- `<capy/task.hpp>`
+- `<capy/ex/run.hpp>`, `<capy/ex/run_async.hpp>`
+- `<capy/ex/thread_pool.hpp>`, `<capy/ex/execution_context.hpp>`
+- `<capy/ex/executor_ref.hpp>`, `<capy/ex/strand.hpp>`
+- `<capy/concept/io_awaitable.hpp>`, `<capy/concept/io_awaitable_task.hpp>`
+- `<capy/concept/frame_allocator.hpp>`
+- `<capy/when_all.hpp>`, `<capy/when_any.hpp>`
+
+### 5. Buffer Sequences (Fifth Section)
+
+Generate content based on public API and agent-guide.md.
+
+**Core thesis to integrate throughout this section:**
+
+The reflexive C++ answer to "how should I represent a buffer?" is `std::span<std::byte>`. This blocks compositional design. For scatter/gather, developers reach for `span<span<byte>>`—but arrays of buffers don't compose without allocation. To combine `HeaderBuffers` (2 spans) and `BodyBuffers` (3 spans), you must allocate a new array. Every composition allocates. This leads to overload proliferation: separate signatures for single buffer, scatter/gather, string, C API, etc.
+
+The concept-driven alternative: a single templated signature accepting any type modeling `ConstBufferSequence`. This accepts spans, string_views, arrays, vectors, custom types—and **any composition of these without allocation**. The key insight from STL design (Stepanov): algorithms parameterized on concepts, not concrete types, enable composition that concrete types forbid.
+
+Even `std::byte` imposes a semantic opinion. POSIX uses `void*` for semantic neutrality—"raw memory, I move bytes without opining on contents." But `span<void>` doesn't compile. Capy provides `const_buffer` and `mutable_buffer` as semantically neutral buffer types with known layout.
+
+**The middle ground**: Concepts at user-facing APIs (composition, flexibility), concrete spans at type-erasure boundaries (virtual functions). The library handles conversion between layers.
+
+**Target page structure** (`buffers/`):
+
+- **Why Concepts, Not Spans** (`buffers/overview.adoc`)
+  - **The I/O use case**: buffers exist to interface with operating system I/O
+  - The reflexive answer: `span<byte>` for buffers, `span<span<byte>>` for scatter/gather
+  - **The composition problem**: combining buffer sequences requires allocation with spans
+  - Example: HTTP headers + body — must allocate to combine with `span<span<byte>>`
+  - **The concept-driven solution**: `ConstBufferSequence` accepts any iterable of memory regions
+  - Single signature accepts span, string_view, array, vector, custom types
+  - Compile-time composition: `cat(header_buffers, body_buffers)` — zero allocation
+  - **STL parallel**: Stepanov's insight — algorithms on iterators (concepts), not containers (types)
+  - The span reflex is a regression from thirty years of generic programming
+- **Buffer Types** (`buffers/types.adoc`)
+  - **Why not `std::byte`?** It imposes semantic opinion; POSIX `void*` is neutral
+  - `span<void>` doesn't compile — can't express type-agnostic abstraction
+  - `const_buffer` — semantically neutral read-only view of contiguous memory
+  - `mutable_buffer` — semantically neutral writable view
+  - Construction, accessors, prefix removal
+  - `make_buffer` — from pointer+size, arrays, vectors, strings
+  - **Layout compatibility**: same memory layout as `iovec`/`WSABUF` — no conversion overhead
+- **Buffer Sequences** (`buffers/sequences.adoc`)
+  - What is a buffer sequence? Bidirectional range with buffer-convertible value type
+  - Single buffers are degenerate sequences (one-element range)
+  - `ConstBufferSequence` and `MutableBufferSequence` concepts
+  - **Heterogeneous composition**: mix string_view, span, custom types — all work
+  - Iterating: `begin()`, `end()`, uniform access
+  - `consuming_buffers` for incremental consumption
+  - **Zero-allocation composition**: combining sequences creates views, not copies
+- **System I/O Integration** (`buffers/system-io.adoc`)
+  - **The virtual boundary**: spans ARE correct at type-erasure points
+  - User-facing API: concepts for composition flexibility
+  - Internal virtual boundary: `span<span<byte>>` for type erasure
+  - Library converts between layers — users get concepts, OS gets iovecs
+  - Translating buffer sequences to `iovec` arrays (POSIX)
+  - Translating to `WSABUF` arrays (Windows)
+  - Stack-based conversion for small sequences (common case, zero heap)
+  - Heap fallback for large sequences
+  - The `registered_buffer` optimization (io_uring, IOCP)
+- **Buffer Algorithms** (`buffers/algorithms.adoc`)
+  - Measuring: `buffer_size`, `buffer_empty`, `buffer_length`
+  - Copying: `buffer_copy` with optional `at_most` parameter
+  - Real I/O loop patterns from `read()` and `write()`
+  - **Practical benefits of concept-based design**:
+    - Zero-copy I/O (data never moves unnecessarily)
+    - Scatter/gather operations (multiple buffers in one syscall)
+    - Custom allocators and memory-mapped buffers
+    - Integration with any user-defined buffer type
+- **Dynamic Buffers** (`buffers/dynamic.adoc`)
+  - The producer/consumer model
+  - The `DynamicBuffer` concept: `prepare(n)`, `commit(n)`, `data()`, `consume(n)`
+  - Capacity management: `size()`, `max_size()`, `capacity()`
+  - `DynamicBufferParam` for safe coroutine parameter passing
+  - Implementations: `flat_dynamic_buffer`, `circular_dynamic_buffer`, `vector_dynamic_buffer`, `string_dynamic_buffer`
+
+**Reference headers**:
+
+- `<capy/buffers.hpp>`
+- `<capy/buffers/const_buffer.hpp>`, `<capy/buffers/mutable_buffer.hpp>`
+- `<capy/buffers/make_buffer.hpp>`
+- `<capy/buffers/buffer_copy.hpp>`, `<capy/buffers/consuming_buffers.hpp>`
+- `<capy/concept/dynamic_buffer.hpp>`
+- `<capy/flat_dynamic_buffer.hpp>`, `<capy/circular_dynamic_buffer.hpp>`
+
+### 6. Stream Concepts (Sixth Section)
+
+Generate content based on public API and agent-guide.md. **Key structure**: For each concept, introduce it, then immediately show its type-erasing wrapper, then demonstrate physical isolation benefits.
+
+**Target page structure** (`streams/`):
+
+- **Overview** (`streams/overview.adoc`)
+  - Six concepts for data flow through programs
+  - Streams vs Sources/Sinks vs Buffer concepts
+  - The type erasure value proposition: compile once, link anywhere
+- **Streams (Partial I/O)** (`streams/streams.adoc`)
+  - `ReadStream` — `read_some(buffers)` returns `(error_code, size_t)`
+  - `any_read_stream` — type-erased wrapper, reference semantics
+  - `WriteStream` — `write_some(buffers)` returns `(error_code, size_t)`
+  - `any_write_stream` — type-erased wrapper
+  - `any_stream` — bidirectional wrapper (both read and write)
+  - **Example**: Echo server with `any_stream&` parameter — works with sockets, TLS, mocks
+- **Sources and Sinks (Complete I/O with EOF)** (`streams/sources-sinks.adoc`)
+  - `ReadSource` — `read(buffers)` fills entirely or returns EOF/error
+  - `any_read_source` — type-erased wrapper
+  - `WriteSink` — `write(buffers)`, `write(buffers, eof)`, `write_eof()`
+  - `any_write_sink` — type-erased wrapper
+  - **Example**: HTTP body handler with `any_write_sink&` — caller doesn't know chunked vs content-length
+- **Buffer Sources and Sinks (Callee-Owns-Buffers)** (`streams/buffer-concepts.adoc`)
+  - `BufferSource` — `pull(arr, max_count)` returns buffer descriptors
+  - `any_buffer_source` — type-erased wrapper
+  - `BufferSink` — `prepare(arr, max_count)`, `commit(n)`, `commit_eof()`
+  - `any_buffer_sink` — type-erased wrapper
+  - Zero-copy: source/sink owns buffers, no intermediate copies
+  - **Example**: Compression pipeline — source provides compressed data, sink receives decompressed
+- **Transfer Algorithms** (`streams/algorithms.adoc`)
+  - `read(stream, buffers)` — loops `read_some` until full or error
+  - `read(source, dynamic_buffer)` — loops until EOF
+  - `write(stream, buffers)` — loops `write_some` until all written
+  - `push_to(BufferSource, WriteSink/WriteStream)` — caller-owns-buffers transfer
+  - `pull_from(ReadSource/ReadStream, BufferSink)` — callee-owns-buffers transfer
+- **Physical Isolation** (`streams/isolation.adoc`)
+  - The compilation firewall pattern
+  - Header declares `task<> process(any_stream&)` — no template, no transport dependency
+  - Implementation in `.cpp` — only this file recompiles when logic changes
+  - Callers wrap concrete streams: `any_stream s{my_tcp_socket}; process(s);`
+  - Build time benefits: faster incremental builds, smaller binaries
+  - **Example**: Library API that accepts `any_read_source&` for body data — works with files, memory, network
+
+**Wrapper characteristics** (document in each relevant page):
+
+- Reference semantics: wrap existing objects without ownership
+- Preallocate coroutine frame at construction for zero steady-state allocation
+- Move-only (non-copyable); cached frame reused across operations
+- Wrapped object must outlive wrapper
+
+**Reference headers**:
+
+- `<capy/concept/read_stream.hpp>`, `<capy/concept/write_stream.hpp>`
+- `<capy/concept/read_source.hpp>`, `<capy/concept/write_sink.hpp>`
+- `<capy/concept/buffer_source.hpp>`, `<capy/concept/buffer_sink.hpp>`
+- `<capy/io/any_stream.hpp>`, `<capy/io/any_read_stream.hpp>`, `<capy/io/any_write_stream.hpp>`
+- `<capy/io/any_read_source.hpp>`, `<capy/io/any_write_sink.hpp>`
+- `<capy/io/any_buffer_source.hpp>`, `<capy/io/any_buffer_sink.hpp>`
+- `<capy/read.hpp>`, `<capy/write.hpp>`
+- `<capy/io/push_to.hpp>`, `<capy/io/pull_from.hpp>`
+
+### 7. Example Programs (Seventh Section)
+
+A catalog of complete, working example programs. **One listing per page.** Each example demonstrates a focused use case with full source code, build instructions, and explanation.
+
+Generate examples that showcase Capy features. Examples should progress from simple to complex.
+
+**Target page structure** (`examples/`):
+
+- **Hello Task** (`examples/hello-task.adoc`)
+  - Simplest possible `task<>` coroutine
+  - `run_async` to launch from main
+  - `thread_pool` as execution context
+  - Shows: basic task creation, launching, completion
+- **Producer-Consumer** (`examples/producer-consumer.adoc`)
+  - Two tasks communicating via `async_event`
+  - Demonstrates coroutine synchronization
+  - Shows: `async_event`, `when_all`, multiple concurrent tasks
+- **Buffer Composition** (`examples/buffer-composition.adoc`)
+  - Composing HTTP-style message: headers + body
+  - Zero-allocation buffer sequence composition
+  - Shows: `const_buffer`, buffer sequences, `cat()`, scatter/gather
+- **Mock Stream Testing** (`examples/mock-stream-testing.adoc`)
+  - Unit testing a protocol parser with mock streams
+  - Error injection with `fuse`
+  - Shows: `test::read_stream`, `test::write_stream`, `fuse`, `run_blocking`
+- **Type-Erased Echo** (`examples/type-erased-echo.adoc`)
+  - Echo logic in a `.cpp` file accepting `any_stream&`
+  - Demonstrates physical isolation / compilation firewall
+  - Shows: `any_stream`, type erasure, build isolation
+- **Timeout with Cancellation** (`examples/timeout-cancellation.adoc`)
+  - Operation with timeout using stop tokens
+  - Demonstrates cooperative cancellation
+  - Shows: `std::stop_source`, `std::stop_token`, cancellation propagation
+- **Parallel Fetch** (`examples/parallel-fetch.adoc`)
+  - Multiple operations in parallel with `when_all`
+  - First-wins pattern with `when_any`
+  - Shows: `when_all`, `when_any`, concurrent composition
+- **Custom Dynamic Buffer** (`examples/custom-dynamic-buffer.adoc`)
+  - Implementing `DynamicBuffer` for a custom allocation strategy
+  - Shows: concept modeling, `prepare`/`commit`/`consume` pattern
+- **Echo Server with Corosio** (`examples/echo-server-corosio.adoc`)
+  - Complete echo server using Corosio sockets
+  - Demonstrates Capy + Corosio integration
+  - Shows: `tcp::acceptor`, `tcp::socket`, `any_stream`, real networking
+  - Requires: Corosio library
+- **Stream Pipeline** (`examples/stream-pipeline.adoc`)
+  - Data transformation chain: source → transform → sink
+  - Demonstrates `BufferSource` and `BufferSink` composition
+  - Shows: `push_to`, `pull_from`, processing chains
+
+**Example format** (each page):
+
+1. **Title and one-sentence description**
+2. **What you'll learn** (bullet points)
+3. **Prerequisites** (other examples or sections to read first)
+4. **Full source code** (complete, compilable)
+5. **Build instructions** (CMake snippet)
+6. **Walkthrough** (explain key sections)
+7. **Exercises** (optional variations to try)
+
+---
+
+## Style Guide
+
+**Apply these rules throughout all documentation.**
+
+### Tone
+
+- **Second person**: "You will configure," "You create a task," not "I think" or "We will learn"
+- **Avoid assumptions**: Never use "simple," "easy," "obviously," "just," "straightforward" — these frustrate readers who struggle
+- **Friendly but formal**: No jargon, memes, slang, or emoji
+- **Focus on outcomes, not process**: Instead of "we will learn how to install," write "you will install"
+
+### Structure (Each Section/Page)
+
+- **Introduction**: What is it? Why learn it? What will you do? What will you accomplish?
+- **Prerequisites**: Explicit checklist with links to prior sections
+- **Steps**: Procedural, numbered, each with intro sentence and closing transition
+- **Conclusion**: Summarize accomplishments, suggest next steps
+
+### Section Arc
+
+- Opening (what reader will accomplish) → Simple case with code → Build complexity → Bridge to next
+- Reader should have something working at the end — practical, not theoretical
+
+### Transitions
+
+- Each step ends with what was accomplished and where they're going next
+- Provides context and motivation to continue
+- Example: "You have now created your first task. Next, you will learn how to launch it on an executor."
+
+### Code Blocks
+
+- **Before**: High-level explanation of what the code does and why
+- **Code**: Show the complete, compilable snippet
+- **After**: Explain important details, gotchas, variations
+- Every command gets a description before and explanation after
+
+### Exposition/Code Balance
+
+- Early sections: more prose, full paragraphs, short code snippets (API flavor)
+- Middle sections: longer examples as concepts build
+- End sections: full programs if appropriate — skip boilerplate (long include lists)
+
+### Content Rules
+
+- **What/Why/How order**: Always explain in that sequence
+- **Gotchas over happy-path**: Document thread safety, lifetimes, platform quirks prominently
+- **Grammar**: "That" (essential, no comma) vs "which" (nonessential, comma)
+- **No unexplained forward refs**: Introduce concepts before referencing them
+- **Prose over member lists**: Don't enumerate all class members; write prose about important ones with example code
+- **Comprehensiveness without assumptions**: Explicitly include everything the reader needs
+
+### Banned Phrases
+
+- "Simply" / "Just" / "Easy" / "Obviously" / "Straightforward"
+- "As you can see" / "It's clear that"
+- "We" when referring to the reader (use "you")
+- "I" (first person singular)
+
+---
+
+## Agent Guide Reference
+
+Point to `doc/agent-guide.md` for:
+
+- Capy-specific section outlines (C++20 Coroutines, I/O Awaitables, Buffers, etc.)
+- Extra instructions (use `thread_pool` in examples)
+- Requirements notes
+
+---
+
+## Rebuild Process
+
+### Phase 1: Outline
+
+- Read `doc/` structure and public API headers
+- Create section/page structure with one-line descriptions per page
+- Identify gaps: public symbols without documentation, stale references
+
+### Phase 2: Linear Generation
+
+- Work section by section, page by page, in order
+- Complete each page fully before moving to the next
+- Apply style guide rules throughout
+- Reference agent-guide.md directives for Capy-specific structure
+
+### Phase 3: Validation
+
+- Verify all public symbols are documented
+- Check cross-references resolve correctly
+- Remove stale references to renamed/deleted API
+- Confirm examples compile (or note dependencies)
diff --git a/.cursor/rules/writing-guide.mdc b/.cursor/rules/writing-guide.mdc
index bbc09a6a..2eea8ed1 100644
--- a/.cursor/rules/writing-guide.mdc
+++ b/.cursor/rules/writing-guide.mdc
@@ -38,6 +38,32 @@ Use second person ("You will configure...") to keep focus on the reader. In some
 
 Use motivational language focused on outcomes. Instead of "You will learn how to install Apache," try "In this tutorial, you will install Apache."
 
+## Technical Depth for Core Topics
+
+Certain foundational topics require deeper, more methodical treatment:
+
+- **Task/coroutine documentation**
+- **Buffer sequences**
+- **Streams**
+
+For these sections:
+
+- Use more technical and methodical exposition
+- Provide convincing explanations with thorough reasoning
+- Include extended background and context
+- Explain the "why" behind design decisions
+
+These topics build reader understanding from first principles, not just usage. Readers need to understand the reasoning to apply concepts correctly in their own code.
+
+# Build Workflow
+
+When documentation is built:
+
+- Obsolete pages are automatically removed
+- New pages are linked into the table of contents
+
+No manual cleanup of old files is needed.
+
 # Structure
 
 ## Introduction

From 443f2139820194eb6cdf690111930b3ff28a5a14 Mon Sep 17 00:00:00 2001
From: Vinnie Falco <vinnie.falco@gmail.com>
Date: Sat, 31 Jan 2026 22:41:58 -0800
Subject: [PATCH 3/3] Regenerate documentation

---
 doc/modules/ROOT/nav.adoc                     |  56 +-
 .../ROOT/pages/buffers/algorithms.adoc        | 334 +++++----
 doc/modules/ROOT/pages/buffers/dynamic.adoc   | 417 +++++------
 doc/modules/ROOT/pages/buffers/overview.adoc  | 242 +++----
 doc/modules/ROOT/pages/buffers/sequences.adoc | 323 +++------
 doc/modules/ROOT/pages/buffers/system-io.adoc | 205 ++++++
 doc/modules/ROOT/pages/buffers/types.adoc     | 200 ++++++
 .../ROOT/pages/concurrency/advanced.adoc      | 241 +++++++
 .../ROOT/pages/concurrency/foundations.adoc   | 235 +++++++
 .../ROOT/pages/concurrency/patterns.adoc      | 293 ++++++++
 .../pages/concurrency/synchronization.adoc    | 202 ++++++
 .../ROOT/pages/coroutines/allocators.adoc     | 182 +++++
 .../ROOT/pages/coroutines/cancellation.adoc   | 479 ++++++++-----
 .../ROOT/pages/coroutines/composition.adoc    | 255 +++++++
 .../ROOT/pages/coroutines/executors.adoc      | 233 +++++++
 .../ROOT/pages/coroutines/io-awaitable.adoc   | 187 +++++
 .../ROOT/pages/coroutines/launching.adoc      | 217 +++---
 doc/modules/ROOT/pages/coroutines/tasks.adoc  | 251 ++++---
 .../ROOT/pages/cpp20-coroutines/advanced.adoc | 641 +++++++++--------
 .../pages/cpp20-coroutines/foundations.adoc   | 274 +++-----
 .../pages/cpp20-coroutines/machinery.adoc     | 653 +++++++-----------
 .../ROOT/pages/cpp20-coroutines/syntax.adoc   | 429 +++---------
 .../pages/examples/buffer-composition.adoc    | 171 +++++
 .../pages/examples/custom-dynamic-buffer.adoc | 298 ++++++++
 .../pages/examples/echo-server-corosio.adoc   | 242 +++++++
 .../ROOT/pages/examples/hello-task.adoc       | 103 +++
 .../pages/examples/mock-stream-testing.adoc   | 204 ++++++
 .../ROOT/pages/examples/parallel-fetch.adoc   | 248 +++++++
 .../pages/examples/producer-consumer.adoc     | 145 ++++
 .../ROOT/pages/examples/stream-pipeline.adoc  | 316 +++++++++
 .../pages/examples/timeout-cancellation.adoc  | 224 ++++++
 .../ROOT/pages/examples/type-erased-echo.adoc | 185 +++++
 doc/modules/ROOT/pages/index.adoc             | 159 ++---
 .../ROOT/pages/streams/algorithms.adoc        | 253 +++++++
 .../ROOT/pages/streams/buffer-concepts.adoc   | 266 +++++++
 doc/modules/ROOT/pages/streams/isolation.adoc | 240 +++++++
 doc/modules/ROOT/pages/streams/overview.adoc  | 199 ++++++
 .../ROOT/pages/streams/sources-sinks.adoc     | 231 +++++++
 doc/modules/ROOT/pages/streams/streams.adoc   | 236 +++++++
 39 files changed, 7713 insertions(+), 2556 deletions(-)
 create mode 100644 doc/modules/ROOT/pages/buffers/system-io.adoc
 create mode 100644 doc/modules/ROOT/pages/buffers/types.adoc
 create mode 100644 doc/modules/ROOT/pages/concurrency/advanced.adoc
 create mode 100644 doc/modules/ROOT/pages/concurrency/foundations.adoc
 create mode 100644 doc/modules/ROOT/pages/concurrency/patterns.adoc
 create mode 100644 doc/modules/ROOT/pages/concurrency/synchronization.adoc
 create mode 100644 doc/modules/ROOT/pages/coroutines/allocators.adoc
 create mode 100644 doc/modules/ROOT/pages/coroutines/composition.adoc
 create mode 100644 doc/modules/ROOT/pages/coroutines/executors.adoc
 create mode 100644 doc/modules/ROOT/pages/coroutines/io-awaitable.adoc
 create mode 100644 doc/modules/ROOT/pages/examples/buffer-composition.adoc
 create mode 100644 doc/modules/ROOT/pages/examples/custom-dynamic-buffer.adoc
 create mode 100644 doc/modules/ROOT/pages/examples/echo-server-corosio.adoc
 create mode 100644 doc/modules/ROOT/pages/examples/hello-task.adoc
 create mode 100644 doc/modules/ROOT/pages/examples/mock-stream-testing.adoc
 create mode 100644 doc/modules/ROOT/pages/examples/parallel-fetch.adoc
 create mode 100644 doc/modules/ROOT/pages/examples/producer-consumer.adoc
 create mode 100644 doc/modules/ROOT/pages/examples/stream-pipeline.adoc
 create mode 100644 doc/modules/ROOT/pages/examples/timeout-cancellation.adoc
 create mode 100644 doc/modules/ROOT/pages/examples/type-erased-echo.adoc
 create mode 100644 doc/modules/ROOT/pages/streams/algorithms.adoc
 create mode 100644 doc/modules/ROOT/pages/streams/buffer-concepts.adoc
 create mode 100644 doc/modules/ROOT/pages/streams/isolation.adoc
 create mode 100644 doc/modules/ROOT/pages/streams/overview.adoc
 create mode 100644 doc/modules/ROOT/pages/streams/sources-sinks.adoc
 create mode 100644 doc/modules/ROOT/pages/streams/streams.adoc

diff --git a/doc/modules/ROOT/nav.adoc b/doc/modules/ROOT/nav.adoc
index 73246d63..0adbc4cf 100644
--- a/doc/modules/ROOT/nav.adoc
+++ b/doc/modules/ROOT/nav.adoc
@@ -5,28 +5,42 @@
 ** xref:cpp20-coroutines/syntax.adoc[Part II: C++20 Syntax]
 ** xref:cpp20-coroutines/machinery.adoc[Part III: Coroutine Machinery]
 ** xref:cpp20-coroutines/advanced.adoc[Part IV: Advanced Topics]
-* Introduction to I/O Awaitables
-** xref:io-awaitables/concepts.adoc[Concept Hierarchy]
-** xref:io-awaitables/executor.adoc[The Executor]
-** xref:io-awaitables/stop-token.adoc[The Stop Token]
-** xref:io-awaitables/allocator.adoc[The Allocator]
-** xref:io-awaitables/launching.adoc[Launching Coroutines]
-* Capy Library
-** xref:library/task.adoc[The task<T> Type]
-** xref:library/io-result.adoc[Error Handling with io_result]
-** xref:library/streams.adoc[Stream Concepts]
-** xref:library/when-all.adoc[Concurrent Composition]
-** xref:library/cancellation.adoc[Cancellation]
-** xref:library/synchronization.adoc[Synchronization Primitives]
-** xref:library/executors.adoc[Executors and Strands]
-** xref:library/frame-allocators.adoc[Frame Allocators]
-* Buffers
-** xref:library/buffers.adoc[Overview]
-** xref:buffers/overview.adoc[Buffers and I/O]
-** xref:buffers/index.adoc[Buffer Types]
+* Introduction to Concurrency
+** xref:concurrency/foundations.adoc[Part I: Foundations]
+** xref:concurrency/synchronization.adoc[Part II: Synchronization]
+** xref:concurrency/advanced.adoc[Part III: Advanced Primitives]
+** xref:concurrency/patterns.adoc[Part IV: Communication & Patterns]
+* Coroutines in Capy
+** xref:coroutines/tasks.adoc[The task Type]
+** xref:coroutines/launching.adoc[Launching Coroutines]
+** xref:coroutines/executors.adoc[Executors and Execution Contexts]
+** xref:coroutines/io-awaitable.adoc[The IoAwaitable Protocol]
+** xref:coroutines/cancellation.adoc[Stop Tokens and Cancellation]
+** xref:coroutines/composition.adoc[Concurrent Composition]
+** xref:coroutines/allocators.adoc[Frame Allocators]
+* Buffer Sequences
+** xref:buffers/overview.adoc[Why Concepts, Not Spans]
+** xref:buffers/types.adoc[Buffer Types]
 ** xref:buffers/sequences.adoc[Buffer Sequences]
+** xref:buffers/system-io.adoc[System I/O Integration]
 ** xref:buffers/algorithms.adoc[Buffer Algorithms]
 ** xref:buffers/dynamic.adoc[Dynamic Buffers]
-* Testing Facilities
-** xref:library/testing.adoc[Overview]
+* Stream Concepts
+** xref:streams/overview.adoc[Overview]
+** xref:streams/streams.adoc[Streams (Partial I/O)]
+** xref:streams/sources-sinks.adoc[Sources and Sinks (Complete I/O)]
+** xref:streams/buffer-concepts.adoc[Buffer Sources and Sinks]
+** xref:streams/algorithms.adoc[Transfer Algorithms]
+** xref:streams/isolation.adoc[Physical Isolation]
+* Example Programs
+** xref:examples/hello-task.adoc[Hello Task]
+** xref:examples/producer-consumer.adoc[Producer-Consumer]
+** xref:examples/buffer-composition.adoc[Buffer Composition]
+** xref:examples/mock-stream-testing.adoc[Mock Stream Testing]
+** xref:examples/type-erased-echo.adoc[Type-Erased Echo]
+** xref:examples/timeout-cancellation.adoc[Timeout with Cancellation]
+** xref:examples/parallel-fetch.adoc[Parallel Fetch]
+** xref:examples/custom-dynamic-buffer.adoc[Custom Dynamic Buffer]
+** xref:examples/echo-server-corosio.adoc[Echo Server with Corosio]
+** xref:examples/stream-pipeline.adoc[Stream Pipeline]
 * xref:reference:boost/capy.adoc[Reference]
diff --git a/doc/modules/ROOT/pages/buffers/algorithms.adoc b/doc/modules/ROOT/pages/buffers/algorithms.adoc
index 4e063dbf..396c9249 100644
--- a/doc/modules/ROOT/pages/buffers/algorithms.adoc
+++ b/doc/modules/ROOT/pages/buffers/algorithms.adoc
@@ -1,90 +1,60 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
-
 = Buffer Algorithms
 
-This page covers the algorithms for measuring and copying buffer sequences.
-
-NOTE: Code snippets assume `using namespace boost::capy;` is in effect.
-
-== Measuring Buffers
-
-Three functions measure different properties of buffer sequences:
+This section covers algorithms for measuring and manipulating buffer sequences.
 
-[cols="1,3"]
-|===
-| Function | Returns
-
-| `buffer_size(seq)`
-| Total bytes across all buffers
+== Prerequisites
 
-| `buffer_empty(seq)`
-| Whether total size is zero
+* Completed xref:sequences.adoc[Buffer Sequences]
+* Understanding of `ConstBufferSequence` and iteration
 
-| `buffer_length(seq)`
-| Number of buffers (not bytes)
-|===
+== Measuring Buffers
 
 === buffer_size
 
-Returns the total number of bytes in a buffer sequence:
+Returns the total number of bytes across all buffers in a sequence:
 
 [source,cpp]
 ----
-#include <boost/capy/buffers.hpp>
-
-char a[100], b[200], c[50];
-std::array<const_buffer, 3> bufs = {
-    const_buffer(a, sizeof(a)),
-    const_buffer(b, sizeof(b)),
-    const_buffer(c, sizeof(c))
-};
-
-std::size_t total = buffer_size(bufs);  // 350
+template<ConstBufferSequence CB>
+std::size_t buffer_size(CB const& buffers);
 ----
 
-This is different from range size—it sums the byte counts:
+Example:
 
 [source,cpp]
 ----
-// buffer_size vs range size
-std::size_t bytes = buffer_size(bufs);    // 350 (total bytes)
-std::size_t count = std::ranges::size(bufs);  // 3 (number of buffers)
-----
-
-Single buffers work too:
+auto buf1 = make_buffer("hello");  // 5 bytes
+auto buf2 = make_buffer("world");  // 5 bytes
+auto combined = std::array{buf1, buf2};
 
-[source,cpp]
-----
-const_buffer single(data, 100);
-std::size_t n = buffer_size(single);  // 100
+std::size_t total = buffer_size(combined);  // 10
 ----
 
+Note: `buffer_size` returns the sum of bytes, not the count of buffers.
+
 === buffer_empty
 
 Checks if a buffer sequence contains no data:
 
 [source,cpp]
 ----
-std::array<const_buffer, 2> bufs = {
-    const_buffer(nullptr, 0),
-    const_buffer(nullptr, 0)
-};
+template<ConstBufferSequence CB>
+bool buffer_empty(CB const& buffers);
+----
+
+A buffer sequence is empty if:
 
-bool empty = buffer_empty(bufs);  // true
+* It contains no buffers, OR
+* All buffers have size zero
 
-bufs[0] = const_buffer(data, 10);
-empty = buffer_empty(bufs);       // false
+[source,cpp]
 ----
+const_buffer empty_buf;
+buffer_empty(empty_buf);  // true
 
-A sequence is empty if all its buffers have size zero, or if
-it contains no buffers at all.
+const_buffer non_empty("data", 4);
+buffer_empty(non_empty);  // false
+----
 
 === buffer_length
 
@@ -92,175 +62,201 @@ Returns the number of buffers in a sequence:
 
 [source,cpp]
 ----
-std::array<const_buffer, 3> bufs = { /* ... */ };
-std::size_t count = buffer_length(bufs);  // 3
+template<ConstBufferSequence CB>
+std::size_t buffer_length(CB const& buffers);
+----
+
+Example:
+
+[source,cpp]
+----
+auto single = make_buffer("hello");
+buffer_length(single);  // 1
 
-const_buffer single(data, 100);
-count = buffer_length(single);            // 1
+auto arr = std::array{buf1, buf2, buf3};
+buffer_length(arr);  // 3
 ----
 
+Note the distinction:
+
+* `buffer_size` — total bytes (data measurement)
+* `buffer_length` — number of buffers (sequence length)
+
 == Copying Buffers
 
-The `buffer_copy` function copies data between buffer sequences
-without requiring contiguous storage.
+=== buffer_copy
 
-=== Basic Usage
+Copies data from one buffer sequence to another:
 
 [source,cpp]
 ----
-#include <boost/capy/buffers/buffer_copy.hpp>
-
-char src_data[] = "Hello, World!";
-char dst_data[32];
-
-const_buffer src(src_data, sizeof(src_data) - 1);
-mutable_buffer dst(dst_data, sizeof(dst_data));
+template<MutableBufferSequence Target, ConstBufferSequence Source>
+std::size_t buffer_copy(Target const& target, Source const& source);
 
-std::size_t copied = buffer_copy(dst, src);  // 13
-// dst_data now contains "Hello, World!"
+template<MutableBufferSequence Target, ConstBufferSequence Source>
+std::size_t buffer_copy(Target const& target, Source const& source, 
+                        std::size_t at_most);
 ----
 
-=== With Multiple Buffers
+Returns the number of bytes copied.
 
-Copy between buffer sequences efficiently:
+Example:
 
 [source,cpp]
 ----
-// Source: three separate buffers
-std::array<const_buffer, 3> src = {
-    const_buffer("Hello", 5),
-    const_buffer(", ", 2),
-    const_buffer("World!", 6)
-};
+char source_data[] = "hello world";
+char dest_data[20];
 
-// Destination: two buffers
-char d1[8], d2[8];
-std::array<mutable_buffer, 2> dst = {
-    mutable_buffer(d1, sizeof(d1)),
-    mutable_buffer(d2, sizeof(d2))
-};
+const_buffer src(source_data, 11);
+mutable_buffer dst(dest_data, 20);
 
-std::size_t n = buffer_copy(dst, src);
-// n == 13
-// d1 contains "Hello, W"
-// d2 contains "orld!\0\0\0"
+std::size_t copied = buffer_copy(dst, src);  // 11
 ----
 
-=== Limiting Copy Size
+=== Partial Copy with at_most
 
-The optional `at_most` parameter limits the copy:
+Limit the number of bytes copied:
 
 [source,cpp]
 ----
-char large_src[1000];
-char small_dst[100];
-
-// Only copy up to 50 bytes
-std::size_t n = buffer_copy(
-    mutable_buffer(small_dst, sizeof(small_dst)),
-    const_buffer(large_src, sizeof(large_src)),
-    50);
-
-// n == 50
+std::size_t copied = buffer_copy(dst, src, 5);  // Copy at most 5 bytes
 ----
 
-The copy stops at the minimum of:
+This is useful for implementing protocols with size limits.
 
-* Destination capacity
-* Source size
-* `at_most` parameter (if provided)
+=== Cross-Sequence Copy
 
-== Real-World Examples
-
-=== Building a Packet
+`buffer_copy` handles sequences with different structure:
 
 [source,cpp]
 ----
-struct packet_header {
-    std::uint16_t type;
-    std::uint16_t length;
-};
+// Source: 3 buffers
+std::array<const_buffer, 3> src = {buf1, buf2, buf3};
 
-task<void> send_packet(
-    WriteStream auto& stream,
-    std::uint16_t type,
-    std::span<char const> payload)
-{
-    packet_header hdr;
-    hdr.type = type;
-    hdr.length = static_cast<std::uint16_t>(payload.size());
-
-    std::array<const_buffer, 2> packet = {
-        const_buffer(&hdr, sizeof(hdr)),
-        const_buffer(payload.data(), payload.size())
-    };
+// Target: 2 buffers with different sizes
+std::array<mutable_buffer, 2> dst = {large_buf, small_buf};
 
-    auto total = buffer_size(packet);  // sizeof(hdr) + payload.size()
-    co_await write(stream, packet);
-}
+// Copies across buffer boundaries as needed
+std::size_t copied = buffer_copy(dst, src);
 ----
 
-=== Coalescing Buffers
+The algorithm fills target buffers sequentially, reading from source buffers as needed, handling cases where a single source buffer spans multiple target buffers or vice versa.
 
-Sometimes you need contiguous data for parsing:
+== Real I/O Patterns
+
+=== Read Loop
 
 [source,cpp]
 ----
-template<ConstBufferSequence Buffers>
-std::vector<char> coalesce(Buffers const& bufs)
+template<ReadStream Stream, MutableBufferSequence Buffers>
+task<std::size_t> read_full(Stream& stream, Buffers buffers)
 {
-    std::vector<char> result(buffer_size(bufs));
-    buffer_copy(make_buffer(result), bufs);
-    return result;
+    consuming_buffers<Buffers> remaining(buffers);
+    std::size_t total = 0;
+    
+    while (buffer_size(remaining) > 0)
+    {
+        auto [ec, n] = co_await stream.read_some(remaining);
+        if (ec.failed())
+            co_return total;  // Return partial read on error
+        
+        remaining.consume(n);
+        total += n;
+    }
+    
+    co_return total;
 }
 ----
 
-=== Progress Tracking
+=== Write Loop
 
 [source,cpp]
 ----
-template<WriteStream Stream>
-task<void> send_with_progress(
-    Stream& stream,
-    ConstBufferSequence auto const& data,
-    std::function<void(std::size_t, std::size_t)> on_progress)
+template<WriteStream Stream, ConstBufferSequence Buffers>
+task<std::size_t> write_full(Stream& stream, Buffers buffers)
 {
-    std::size_t const total = buffer_size(data);
-    std::size_t sent = 0;
-
-    consuming_buffers cb(data);
-    while (sent < total)
+    consuming_buffers<Buffers> remaining(buffers);
+    std::size_t total = 0;
+    
+    while (buffer_size(remaining) > 0)
     {
-        auto [ec, n] = co_await stream.write_some(cb);
+        auto [ec, n] = co_await stream.write_some(remaining);
         if (ec.failed())
-            co_return;
-        cb.consume(n);
-        sent += n;
-        on_progress(sent, total);
+            co_return total;
+        
+        remaining.consume(n);
+        total += n;
     }
+    
+    co_return total;
 }
 ----
 
-== Summary
+== Practical Benefits of Concept-Based Design
 
-[cols="1,3"]
-|===
-| Function | Purpose
+=== Zero-Copy I/O
 
-| `buffer_size`
-| Total bytes across all buffers in a sequence
+Data never moves unnecessarily. The buffer sequence points to existing data, and the OS reads directly from those locations:
 
-| `buffer_empty`
-| Check if total byte count is zero
+[source,cpp]
+----
+std::string header = build_header();
+std::vector<char> body = load_body();
+
+// No copying—header and body are written directly
+co_await write(stream, cat(make_buffer(header), make_buffer(body)));
+----
+
+=== Scatter/Gather Operations
 
-| `buffer_length`
-| Number of buffers in a sequence
+Multiple buffers transfer in a single operation:
+
+[source,cpp]
+----
+std::array buffers = {header_buf, separator_buf, body_buf, footer_buf};
+co_await write(stream, buffers);  // Single system call
+----
 
-| `buffer_copy`
-| Copy data between buffer sequences
+=== Custom Allocators and Memory-Mapped Buffers
+
+Any memory region can be a buffer:
+
+[source,cpp]
+----
+// Memory-mapped file
+void* mapped = mmap(...);
+const_buffer file_buf(mapped, file_size);
+co_await write(socket, file_buf);  // Zero-copy network transmission
+----
+
+=== User-Defined Buffer Types
+
+Create custom types that satisfy the concepts:
+
+[source,cpp]
+----
+class chunked_buffer_sequence
+{
+    std::vector<std::vector<char>> chunks_;
+    
+public:
+    auto begin() const { /* return iterator over chunks as buffers */ }
+    auto end() const { /* return end iterator */ }
+};
+// Satisfies ConstBufferSequence—works with all algorithms
+----
+
+== Reference
+
+[cols="1,3"]
 |===
+| Header | Description
 
-== Navigation
+| `<boost/capy/buffers.hpp>`
+| Measurement algorithms (`buffer_size`, `buffer_empty`, `buffer_length`)
+
+| `<boost/capy/buffers/buffer_copy.hpp>`
+| Copy algorithm
+|===
 
-[.text-center]
-xref:sequences.adoc[← Buffer Sequences] | xref:dynamic.adoc[Next: Dynamic Buffers →]
+You have now learned how to measure and copy buffer sequences. Continue to xref:dynamic.adoc[Dynamic Buffers] to learn about growable buffer storage.
diff --git a/doc/modules/ROOT/pages/buffers/dynamic.adoc b/doc/modules/ROOT/pages/buffers/dynamic.adoc
index 08f6a124..da52d5df 100644
--- a/doc/modules/ROOT/pages/buffers/dynamic.adoc
+++ b/doc/modules/ROOT/pages/buffers/dynamic.adoc
@@ -1,374 +1,285 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
-
 = Dynamic Buffers
 
-This page explains dynamic buffers—containers that manage resizable storage
-with a producer/consumer model designed for streaming I/O.
+This section introduces dynamic buffers—growable storage that adapts to data flow between producers and consumers.
 
-NOTE: Code snippets assume `using namespace boost::capy;` is in effect.
+== Prerequisites
 
-== The Producer/Consumer Model
+* Completed xref:algorithms.adoc[Buffer Algorithms]
+* Understanding of buffer sequences and copying
 
-Dynamic buffers serve as intermediate storage between a **producer** that
-writes data and a **consumer** that reads it. The classic example is network
-I/O: the network produces data that your application consumes.
+== The Producer/Consumer Model
 
-[source,text]
-----
-    Network (Producer)
-          |
-          v
-    +-----------------+
-    | Dynamic Buffer  |
-    | [readable data] |
-    +-----------------+
-          |
-          v
-    Application (Consumer)
-----
+Dynamic buffers serve as intermediate storage between a *producer* (typically network I/O) and a *consumer* (your application code).
 
-The buffer API reflects this model:
+The flow:
 
-* **Producer side**: `prepare(n)` → write data → `commit(n)`
-* **Consumer side**: `data()` → read data → `consume(n)`
+1. **Producer** writes data into the buffer
+2. **Buffer** grows as needed to accommodate data
+3. **Consumer** reads and processes data
+4. **Buffer** releases consumed data
 
-This separation prevents reading uncommitted data or overwriting unread data.
+This model decouples production rate from consumption rate—the buffer absorbs variations.
 
 == The DynamicBuffer Concept
 
 [source,cpp]
 ----
-template<class T>
-concept DynamicBuffer = requires(T& buf, T const& cbuf, std::size_t n)
-{
-    // Capacity management
-    { cbuf.size() } -> std::convertible_to<std::size_t>;      // Readable bytes
-    { cbuf.max_size() } -> std::convertible_to<std::size_t>;  // Maximum capacity
-    { cbuf.capacity() } -> std::convertible_to<std::size_t>;  // Current capacity
-
-    // Consumer side
-    { cbuf.data() } -> ConstBufferSequence;   // Get readable data
-    buf.consume(n);                            // Discard n bytes
-
+template<typename T>
+concept DynamicBuffer = requires(T& t, std::size_t n) {
     // Producer side
-    { buf.prepare(n) } -> MutableBufferSequence;  // Get writable space
-    buf.commit(n);                                 // Make n bytes readable
+    { t.prepare(n) } -> MutableBufferSequence;
+    { t.commit(n) };
+    
+    // Consumer side
+    { t.data() } -> ConstBufferSequence;
+    { t.consume(n) };
+    
+    // Capacity
+    { t.size() } -> std::same_as<std::size_t>;
+    { t.max_size() } -> std::same_as<std::size_t>;
+    { t.capacity() } -> std::same_as<std::size_t>;
 };
 ----
 
-=== Producer Operations
+== Producer Interface
+
+=== prepare(n)
+
+Returns mutable buffer space for writing up to `n` bytes:
 
 [source,cpp]
 ----
-flat_dynamic_buffer buf(storage, sizeof(storage));
+auto buffers = dynamic_buf.prepare(1024);  // Space for up to 1024 bytes
+----
 
-// 1. Get writable space
-auto writable = buf.prepare(1024);
+The returned space may be larger than requested. The data is not yet part of the readable sequence.
 
-// 2. Write data (e.g., from network)
-std::size_t n = receive_data(writable);
+=== commit(n)
 
-// 3. Make written bytes readable
-buf.commit(n);
-----
-
-=== Consumer Operations
+Marks `n` bytes of prepared space as written and readable:
 
 [source,cpp]
 ----
-// 1. Get readable data
-auto readable = buf.data();
-
-// 2. Process the data
-std::size_t processed = parse_message(readable);
-
-// 3. Discard processed bytes
-buf.consume(processed);
+// After writing data:
+dynamic_buf.commit(bytes_written);
+// Data is now visible via data()
 ----
 
-== DynamicBufferParam for Coroutines
-
-When passing dynamic buffers to coroutine functions, use `DynamicBufferParam`
-with a forwarding reference:
+=== Typical Producer Pattern
 
 [source,cpp]
 ----
-auto read(ReadSource auto& source, DynamicBufferParam auto&& buffers)
-    -> task<io_result<std::size_t>>;
+task<> read_into_buffer(Stream& stream, DynamicBuffer auto& buffer)
+{
+    // Prepare space
+    auto space = buffer.prepare(1024);
+    
+    // Read into prepared space
+    auto [ec, n] = co_await stream.read_some(space);
+    
+    if (!ec.failed())
+        buffer.commit(n);  // Make data readable
+}
 ----
 
-This concept enforces safe parameter passing:
-
-* **Lvalues**: Always allowed—the caller manages the buffer's lifetime
-* **Rvalues**: Only allowed for adapter types that update external storage
+== Consumer Interface
 
-=== Why the Distinction?
+=== data()
 
-Some buffer types store bookkeeping internally:
+Returns the readable data as a const buffer sequence:
 
 [source,cpp]
 ----
-// BAD: rvalue non-adapter loses bookkeeping
-flat_dynamic_buffer buf(storage, sizeof(storage));
-co_await read(source, std::move(buf));  // Compile error!
-
-// GOOD: lvalue keeps bookkeeping accessible
-flat_dynamic_buffer buf(storage, sizeof(storage));
-co_await read(source, buf);  // OK
+auto readable = dynamic_buf.data();
+// Process readable bytes
 ----
 
-Adapters update external storage, so rvalues are safe:
+=== consume(n)
+
+Removes `n` bytes from the front of readable data:
 
 [source,cpp]
 ----
-// GOOD: adapter rvalue, string retains data
-std::string body;
-co_await read(source, string_dynamic_buffer(&body));  // OK
+dynamic_buf.consume(processed_bytes);
+// Those bytes are no longer in data()
 ----
 
-=== Marking Adapter Types
-
-Types safe to pass as rvalues define a nested tag:
+=== Typical Consumer Pattern
 
 [source,cpp]
 ----
-class string_dynamic_buffer {
-public:
-    using is_dynamic_buffer_adapter = void;
-    // ...
-};
+void process_buffer(DynamicBuffer auto& buffer)
+{
+    auto data = buffer.data();
+    
+    while (buffer_size(data) >= message_header_size)
+    {
+        auto msg_size = parse_header(data);
+        if (buffer_size(data) < msg_size)
+            break;  // Need more data
+        
+        process_message(data, msg_size);
+        buffer.consume(msg_size);
+        data = buffer.data();  // Refresh after consume
+    }
+}
 ----
 
-== Provided Implementations
+== Capacity Management
 
-=== flat_dynamic_buffer
+`size()`::
+Current number of readable bytes (the length of `data()`).
 
-Uses a single contiguous memory region:
+`max_size()`::
+Maximum allowed size. Attempts to grow beyond this throw or fail.
 
-[source,cpp]
-----
-#include <boost/capy/buffers/flat_dynamic_buffer.hpp>
+`capacity()`::
+Current allocated capacity. May be larger than `size()`.
+
+== DynamicBufferParam
 
-char storage[4096];
-flat_dynamic_buffer buf(storage, sizeof(storage));
+When passing dynamic buffers to coroutines, use `DynamicBufferParam` for safe parameter handling:
 
-// prepare() and data() always return single-element sequences
-auto writable = buf.prepare(100);  // Single mutable_buffer
-auto readable = buf.data();         // Single const_buffer
+[source,cpp]
 ----
+template<typename DB>
+concept DynamicBufferParam = DynamicBuffer<std::remove_reference_t<DB>>;
 
-**Characteristics:**
+template<DynamicBufferParam Buf>
+task<std::size_t> read_until(Stream& stream, Buf&& buffer, char delimiter);
+----
 
-* Data is always contiguous—good for parsers requiring linear access
-* Buffer sequences have exactly one element
-* May waste space after consume (gap at front)
+This concept ensures proper handling of lvalues and rvalues, preventing dangling references across suspension points.
 
-**Best for:** Protocols requiring contiguous parsing, simple request/response patterns.
+== Provided Implementations
 
-=== circular_dynamic_buffer
+=== flat_dynamic_buffer
 
-Uses a ring buffer that wraps around:
+Linear storage with single-buffer sequences:
 
 [source,cpp]
 ----
-#include <boost/capy/buffers/circular_dynamic_buffer.hpp>
+#include <boost/capy/buffers/flat_dynamic_buffer.hpp>
 
-char storage[4096];
-circular_dynamic_buffer buf(storage, sizeof(storage));
+flat_dynamic_buffer buffer;
+buffer.prepare(1024);
+// ... write data ...
+buffer.commit(n);
 
-// Data may wrap around
-auto readable = buf.data();  // May return const_buffer_pair
+// data() returns a single const_buffer
 ----
 
-**Characteristics:**
+Advantages:
 
-* No space wasted after consume (wraps around)
-* Buffer sequences may have two elements when data spans the wrap point
-* Fixed maximum capacity
+* Contiguous memory—good for parsing that needs contiguous data
+* Cache-friendly
 
-**Best for:** Continuous streaming, bidirectional protocols, high-throughput scenarios.
+Disadvantages:
 
-=== string_dynamic_buffer
+* May require copying when buffer wraps or grows
+
+=== circular_dynamic_buffer
 
-Adapts a `std::string` as a dynamic buffer:
+Ring buffer implementation:
 
 [source,cpp]
 ----
-#include <boost/capy/buffers/string_dynamic_buffer.hpp>
-
-std::string body;
-string_dynamic_buffer buf(&body);
+#include <boost/capy/buffers/circular_dynamic_buffer.hpp>
 
-// Buffer operations modify the string
-co_await read(source, buf);
-// body now contains the data
+circular_dynamic_buffer<1024> buffer;  // Fixed capacity
 ----
 
-**Characteristics:**
+Advantages:
 
-* Does not own storage—wraps an existing string
-* String grows as needed (up to max_size)
-* Destructor resizes string to final readable size
-* Move-only (cannot be copied)
-* Marked as adapter—safe to pass as rvalue
+* No copying on wrap—head/tail pointers move
+* Fixed memory footprint
 
-**Best for:** Building string results, HTTP bodies, text protocols.
+Disadvantages:
+
+* `data()` may return two buffers (wrapped around end)
+* Fixed capacity
 
 === vector_dynamic_buffer
 
-Adapts a `std::vector<unsigned char>` as a dynamic buffer:
+Backed by `std::vector<char>`:
 
 [source,cpp]
 ----
 #include <boost/capy/buffers/vector_dynamic_buffer.hpp>
 
-std::vector<unsigned char> data;
-vector_dynamic_buffer buf(&data);
-
-co_await read(source, buf);
-// data now contains the bytes
+std::vector<char> storage;
+vector_dynamic_buffer buffer(storage);
 ----
 
-**Characteristics:**
-
-* Similar to string_dynamic_buffer but for binary data
-* Vector grows as needed
-* Marked as adapter—safe to pass as rvalue
+Adapts an existing vector for use as a dynamic buffer.
 
-**Best for:** Binary protocols, file I/O, binary message bodies.
-
-== Real-World Examples
+=== string_dynamic_buffer
 
-=== Reading HTTP Body
+Backed by `std::string`:
 
 [source,cpp]
 ----
-task<std::string> read_body(ReadSource auto& source, std::size_t content_length)
-{
-    std::string body;
-    string_dynamic_buffer buf(&body, content_length);
-
-    // Read until we have the full body
-    char temp[4096];
-    std::size_t remaining = content_length;
-    while (remaining > 0)
-    {
-        auto to_read = (std::min)(remaining, sizeof(temp));
-        auto [ec, n] = co_await source.read(mutable_buffer(temp, to_read));
-        if (ec.failed())
-            co_return {};
-
-        auto writable = buf.prepare(n);
-        buffer_copy(writable, const_buffer(temp, n));
-        buf.commit(n);
-        remaining -= n;
-    }
+#include <boost/capy/buffers/string_dynamic_buffer.hpp>
 
-    co_return body;
-}
+std::string storage;
+string_dynamic_buffer buffer(storage);
 ----
 
-=== Streaming Protocol Parser
+Useful when you want the final data as a string.
+
+== Example: Line-Based Protocol
 
 [source,cpp]
 ----
-task<void> parse_stream(ReadStream auto& stream)
+task<std::string> read_line(Stream& stream)
 {
-    char storage[8192];
-    circular_dynamic_buffer buf(storage, sizeof(storage));
-
-    for (;;)
+    flat_dynamic_buffer buffer;
+    
+    while (true)
     {
-        // Read more data
-        auto [ec, n] = co_await stream.read_some(buf.prepare(1024));
-        if (ec == cond::eof)
-            break;
+        // Prepare space and read
+        auto space = buffer.prepare(256);
+        auto [ec, n] = co_await stream.read_some(space);
         if (ec.failed())
-            co_return;
-        buf.commit(n);
-
-        // Parse complete messages
-        while (auto msg = try_parse_message(buf.data()))
+            throw std::system_error(ec);
+        buffer.commit(n);
+        
+        // Search for newline in readable data
+        auto data = buffer.data();
+        std::string_view sv(
+            static_cast<char const*>(data.data()), data.size());
+        
+        auto pos = sv.find('\n');
+        if (pos != std::string_view::npos)
         {
-            handle_message(*msg);
-            buf.consume(msg->size());
+            std::string line(sv.substr(0, pos));
+            buffer.consume(pos + 1);  // Include newline
+            co_return line;
         }
     }
 }
 ----
 
-=== Read Until EOF
-
-[source,cpp]
-----
-task<std::vector<unsigned char>> read_all(ReadSource auto& source)
-{
-    std::vector<unsigned char> result;
-    vector_dynamic_buffer buf(&result);
-
-    auto [ec, total] = co_await read(source, buf);
-    // read() loops until EOF, growing the buffer as needed
-
-    co_return result;
-}
-----
-
-== Choosing a Buffer Type
-
-[cols="1,2,2"]
-|===
-| Type | Best For | Trade-off
-
-| `flat_dynamic_buffer`
-| Protocols requiring contiguous data
-| May waste space after consume
-
-| `circular_dynamic_buffer`
-| High-throughput streaming
-| Two-element sequences add complexity
-
-| `string_dynamic_buffer`
-| Building string results
-| Requires external string ownership
-
-| `vector_dynamic_buffer`
-| Binary data accumulation
-| Requires external vector ownership
-|===
-
-== Summary
+== Reference
 
 [cols="1,3"]
 |===
-| Component | Purpose
+| Header | Description
 
-| `DynamicBuffer`
-| Concept for resizable buffers with prepare/commit semantics
+| `<boost/capy/concept/dynamic_buffer.hpp>`
+| DynamicBuffer concept definition
 
-| `DynamicBufferParam`
-| Safe parameter passing constraint for coroutines
+| `<boost/capy/buffers/flat_dynamic_buffer.hpp>`
+| Linear dynamic buffer
 
-| `flat_dynamic_buffer`
-| Contiguous storage, single-element sequences
+| `<boost/capy/buffers/circular_dynamic_buffer.hpp>`
+| Ring buffer implementation
 
-| `circular_dynamic_buffer`
-| Ring buffer, no space waste, may have two-element sequences
+| `<boost/capy/buffers/vector_dynamic_buffer.hpp>`
+| Vector-backed adapter
 
-| `string_dynamic_buffer`
-| Adapter for `std::string`
-
-| `vector_dynamic_buffer`
-| Adapter for `std::vector<unsigned char>`
+| `<boost/capy/buffers/string_dynamic_buffer.hpp>`
+| String-backed adapter
 |===
 
-== Navigation
-
-[.text-center]
-xref:algorithms.adoc[← Buffer Algorithms] | xref:../library/streams.adoc[Next: Stream Concepts →]
+You have now learned about dynamic buffers for producer/consumer patterns. This completes the Buffer Sequences section. Continue to xref:../streams/overview.adoc[Stream Concepts] to learn about Capy's stream abstractions.
diff --git a/doc/modules/ROOT/pages/buffers/overview.adoc b/doc/modules/ROOT/pages/buffers/overview.adoc
index 68d02ab0..e1f5dc4c 100644
--- a/doc/modules/ROOT/pages/buffers/overview.adoc
+++ b/doc/modules/ROOT/pages/buffers/overview.adoc
@@ -1,201 +1,159 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
+= Why Concepts, Not Spans
 
-= Buffers and I/O
+This section explains why Capy uses concept-driven buffer sequences instead of `std::span`, and why this design enables composition without allocation.
 
-This page explains why buffers exist and how they enable efficient I/O
-operations in network programming.
+== Prerequisites
 
-NOTE: Code snippets assume `using namespace boost::capy;` is in effect.
+* Basic C++ experience with memory and pointers
+* Familiarity with C++20 concepts
 
-== Why Buffers?
+== The I/O Use Case
 
-Every I/O operation ultimately moves bytes between memory and an external
-resource—a socket, file, or device. The operating system reads and writes
-bytes from contiguous memory regions. A buffer is simply a reference to
-such a region: a pointer and a size.
+Buffers exist to interface with operating system I/O. When you read from a socket, write to a file, or transfer data through any I/O channel, you work with contiguous memory regions—addresses and byte counts.
+
+The fundamental unit is a `(pointer, size)` pair. The OS reads bytes from or writes bytes to linear addresses.
+
+== The Reflexive Answer: span
+
+The instinctive C++ answer to "how should I represent a buffer?" is `std::span<std::byte>`:
 
 [source,cpp]
 ----
-// The fundamental pattern: pointer + size
-void const* data = ...;
-std::size_t size = ...;
-
-// This is what OS system calls need
-send(socket, data, size, flags);
+void write_data(std::span<std::byte const> data);
+void read_data(std::span<std::byte> buffer);
 ----
 
-Capy's buffer types wrap this pattern with type safety and convenience.
+This works for single contiguous buffers. But I/O often involves multiple buffers—a technique called *scatter/gather I/O*.
+
+== Scatter/Gather I/O
+
+Consider assembling an HTTP message. The headers are in one buffer; the body is in another. With single-buffer APIs, you must:
 
-== Platform-Agnostic I/O
+1. Allocate a new buffer large enough for both
+2. Copy headers into the new buffer
+3. Copy body after headers
+4. Send the combined buffer
 
-Capy's I/O algorithms work against concepts rather than concrete types.
-The `read()` and `write()` functions don't care whether you're using:
+This is wasteful. The data already exists—why copy it?
 
-* A real TCP socket
-* An SSL/TLS encrypted stream
-* A mock object for unit testing
-* A custom stream implementation
+Scatter/gather I/O solves this. Operating systems provide vectored I/O calls (`writev` on POSIX, scatter/gather with IOCP on Windows) that accept multiple buffers and transfer them as a single logical operation.
+
+== The Span Reflex for Multiple Buffers
+
+Extending the span reflex: `std::span<std::span<std::byte>>`:
 
 [source,cpp]
 ----
-// This function works with ANY stream type
-template<ReadStream Stream>
-task<std::string> read_message(Stream& stream)
-{
-    char header[4];
-    auto [ec, n] = co_await read(stream, mutable_buffer(header, 4));
-    if (ec.failed())
-        co_return {};
-
-    std::uint32_t len = decode_length(header);
-    std::string body(len, '\0');
-    auto [ec2, n2] = co_await read(stream, mutable_buffer(body.data(), len));
-    co_return body;
-}
+void write_data(std::span<std::span<std::byte const> const> buffers);
 ----
 
-This means you can test network code without network I/O:
+This works, but introduces a composition problem.
+
+== The Composition Problem
+
+Suppose you have:
 
 [source,cpp]
 ----
-// In production: real socket
-task<> handle_client(tcp::socket& sock)
-{
-    auto msg = co_await read_message(sock);
-}
-
-// In tests: mock stream
-task<> test_read_message()
-{
-    test::read_stream mock("\\x00\\x00\\x00\\x05Hello");
-    auto msg = co_await read_message(mock);
-    assert(msg == "Hello");
-}
+using HeaderBuffers = std::array<std::span<std::byte const>, 2>;  // 2 buffers
+using BodyBuffers = std::array<std::span<std::byte const>, 3>;    // 3 buffers
 ----
 
-== Scatter/Gather I/O
-
-Real protocols rarely transmit data as a single contiguous block.
-Consider an HTTP response:
+To send headers followed by body, you need 5 buffers total. With `span<span<byte>>`:
 
-[source,text]
+[source,cpp]
 ----
-HTTP/1.1 200 OK\r\n
-Content-Type: text/plain\r\n
-Content-Length: 13\r\n
-\r\n
-Hello, World!
+HeaderBuffers headers = /* ... */;
+BodyBuffers body = /* ... */;
+
+// To combine, you MUST allocate a new array:
+std::array<std::span<std::byte const>, 5> combined;
+std::copy(headers.begin(), headers.end(), combined.begin());
+std::copy(body.begin(), body.end(), combined.begin() + 2);
+
+write_data(combined);
 ----
 
-The headers come from one place (perhaps formatted on the stack),
-while the body comes from another (perhaps a file or database).
-Copying everything into a single buffer wastes time and memory.
+Every composition allocates. This leads to:
 
-Scatter/gather I/O solves this with multiple buffers in a single operation.
+* Overload proliferation—separate functions for single buffer, multiple buffers, common cases
+* Performance overhead—allocation on every composition
+* Boilerplate—manual copying everywhere
 
-=== Gather Write
+== The Concept-Driven Alternative
 
-Write from multiple non-contiguous memory regions in one call:
+Instead of concrete types, use concepts. Define `ConstBufferSequence` as "any type that can produce a sequence of buffers":
 
 [source,cpp]
 ----
-std::string headers = format_headers(status, content_type, body.size());
-std::string_view body = get_response_body();
-
-// Gather write: two buffers, one operation
-std::array<const_buffer, 2> buffers = {
-    const_buffer(headers.data(), headers.size()),
-    const_buffer(body.data(), body.size())
-};
-
-auto [ec, n] = co_await write(stream, buffers);
+template<ConstBufferSequence Buffers>
+void write_data(Buffers const& buffers);
 ----
 
-The stream writes both regions without copying them together first.
+This single signature accepts:
+
+* A single `const_buffer`
+* A `span<const_buffer>`
+* A `vector<const_buffer>`
+* A `string_view` (converts to single buffer)
+* A custom composite type
+* *Any composition of the above—without allocation*
 
-=== Scatter Read
+== Zero-Allocation Composition
 
-Read into multiple buffers in one operation:
+With concepts, composition creates views, not copies:
 
 [source,cpp]
 ----
-// WebSocket frame: 2-byte header + up to 125-byte payload
-char header[2];
-char payload[125];
+HeaderBuffers headers = /* ... */;
+BodyBuffers body = /* ... */;
 
-std::array<mutable_buffer, 2> buffers = {
-    mutable_buffer(header, 2),
-    mutable_buffer(payload, 125)
-};
+// cat() creates a view that iterates both sequences
+auto combined = cat(headers, body);  // No allocation!
 
-auto [ec, n] = co_await stream.read_some(buffers);
-// Data filled header first, then payload
+write_data(combined);  // Works because combined satisfies ConstBufferSequence
 ----
 
-== Real-World Example: WebSocket Frame
+The `cat` function returns a lightweight object that, when iterated, first yields buffers from `headers`, then from `body`. The buffers themselves are not copied—only iterators are composed.
 
-A WebSocket frame has a variable-length header followed by payload:
+== STL Parallel
 
-[source,cpp]
-----
-task<io_result<std::size_t>> send_frame(
-    WriteStream auto& stream,
-    std::uint8_t opcode,
-    std::span<char const> payload)
-{
-    // Build the frame header
-    char header[2];
-    header[0] = 0x80 | opcode;  // FIN + opcode
-    header[1] = static_cast<char>(payload.size());
-
-    // Gather write: header + payload
-    std::array<const_buffer, 2> frame = {
-        const_buffer(header, 2),
-        const_buffer(payload.data(), payload.size())
-    };
-
-    co_return co_await write(stream, frame);
-}
-----
+This design follows Stepanov's insight from the STL: algorithms parameterized on concepts (iterators), not concrete types (containers), enable composition that concrete types forbid.
+
+The span reflex is a regression from thirty years of generic programming. Concepts restore the compositional power that concrete types lack.
 
-No intermediate buffer, no copying—just efficient I/O.
+== The Middle Ground
 
-== The Buffer Abstraction Layers
+Concepts provide flexibility at user-facing APIs. But at type-erasure boundaries—virtual functions, library boundaries—concrete types are necessary.
 
-Capy provides buffers at multiple abstraction levels:
+Capy's approach:
 
-[cols="1,3"]
-|===
-| Level | Types
+* *User-facing APIs* — Accept concepts for maximum flexibility
+* *Type-erasure boundaries* — Use concrete spans internally
+* *Library handles conversion* — Users get concepts; implementation uses spans
 
-| Single buffers
-| `const_buffer`, `mutable_buffer`
+This gives users the composition benefits of concepts while hiding the concrete types needed for virtual dispatch.
 
-| Buffer sequences
-| Arrays, `const_buffer_pair`, any bidirectional range
+== Why Not std::byte?
 
-| Dynamic buffers
-| `flat_dynamic_buffer`, `circular_dynamic_buffer`
-|===
+Even `std::byte` imposes a semantic opinion. POSIX uses `void*` for semantic neutrality—"raw memory, I move bytes without opining on contents."
 
-As you move up the abstraction ladder, you trade some control for
-convenience. Start with the simplest tool that fits your needs.
+But `span<void>` doesn't compile—C++ can't express type-agnostic buffer abstraction with `span`.
+
+Capy provides `const_buffer` and `mutable_buffer` as semantically neutral buffer types. They have known layout compatible with OS structures (`iovec`, `WSABUF`) without imposing `std::byte` semantics.
 
 == Summary
 
-* Buffers represent contiguous memory regions for I/O
-* Buffer sequences enable scatter/gather I/O without copying
-* I/O algorithms work against concepts, not concrete types
-* The same code works with real streams, mock objects, and custom implementations
+The reflexive `span<span<byte>>` approach:
+
+* Forces allocation on every composition
+* Leads to overload proliferation
+* Loses the compositional power of generic programming
+
+The concept-driven approach:
 
-== Navigation
+* Enables zero-allocation composition
+* Provides a single signature that accepts anything buffer-like
+* Follows proven STL design principles
 
-[.text-center]
-xref:../library/buffers.adoc[← Overview] | xref:index.adoc[Next: Buffer Types →]
+Continue to xref:types.adoc[Buffer Types] to learn about `const_buffer` and `mutable_buffer`.
diff --git a/doc/modules/ROOT/pages/buffers/sequences.adoc b/doc/modules/ROOT/pages/buffers/sequences.adoc
index 8525f443..2c8ad484 100644
--- a/doc/modules/ROOT/pages/buffers/sequences.adoc
+++ b/doc/modules/ROOT/pages/buffers/sequences.adoc
@@ -1,310 +1,175 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
-
 = Buffer Sequences
 
-This page explains how to work with multiple buffers as a logical unit,
-enabling scatter/gather I/O without data copying.
-
-NOTE: Code snippets assume `using namespace boost::capy;` is in effect.
-
-== What is a Buffer Sequence?
+This section explains buffer sequences—the concept that enables zero-allocation composition of buffers.
 
-A buffer sequence represents a logical byte stream stored across multiple
-non-contiguous memory regions. Instead of copying data into a single buffer,
-you describe where the pieces are and let I/O operations handle them directly.
-
-[source,cpp]
-----
-// Three separate memory regions form one logical message
-std::array<const_buffer, 3> message = {
-    const_buffer(header, header_size),
-    const_buffer(body, body_size),
-    const_buffer(footer, footer_size)
-};
-
-// Write all three as a single operation
-co_await write(stream, message);
-----
+== Prerequisites
 
-== Buffer Sequence Concepts
+* Completed xref:types.adoc[Buffer Types]
+* Understanding of `const_buffer` and `mutable_buffer`
 
-Capy defines two concepts for buffer sequences:
+== What Is a Buffer Sequence?
 
-[cols="1,3"]
-|===
-| Concept | Description
+A *buffer sequence* is any type that can produce an iteration of buffers. Formally:
 
-| `ConstBufferSequence`
-| A range whose elements convert to `const_buffer`
+* A single buffer (like `const_buffer`) is a sequence of one element
+* A range of buffers (like `vector<const_buffer>`) is a multi-element sequence
+* Any bidirectional range with buffer-convertible values qualifies
 
-| `MutableBufferSequence`
-| A range whose elements convert to `mutable_buffer`
-|===
-
-A type satisfies these concepts if it is either:
-
-* Convertible to `const_buffer` or `mutable_buffer` directly, OR
-* A bidirectional range with buffer-convertible elements
-
-This means single buffers are valid buffer sequences:
-
-[source,cpp]
-----
-// Single buffer works anywhere a buffer sequence is expected
-const_buffer single_buf(data, size);
-co_await write(stream, single_buf);  // OK
-----
-
-== Iterating Buffer Sequences
+== The Concepts
 
-Use `begin()` and `end()` to iterate uniformly over any buffer sequence:
+=== ConstBufferSequence
 
 [source,cpp]
 ----
-template<ConstBufferSequence Buffers>
-void dump_buffers(Buffers const& bufs)
-{
-    for (auto it = begin(bufs); it != end(bufs); ++it)
-    {
-        const_buffer b = *it;
-        std::cout << "Buffer: " << b.size() << " bytes at "
-                  << b.data() << "\n";
-    }
-}
+template<typename T>
+concept ConstBufferSequence =
+    std::is_convertible_v<T, const_buffer> || (
+        std::ranges::bidirectional_range<T> &&
+        std::is_convertible_v<std::ranges::range_value_t<T>, const_buffer>);
 ----
 
-These functions handle both single buffers and ranges uniformly.
+A type satisfies `ConstBufferSequence` if:
 
-== Buffer Pairs
+* It converts to `const_buffer` directly (single buffer), OR
+* It is a bidirectional range whose elements convert to `const_buffer`
 
-For sequences with exactly two elements, `const_buffer_pair` and
-`mutable_buffer_pair` provide optimized storage:
+=== MutableBufferSequence
 
 [source,cpp]
 ----
-const_buffer_pair pair(
-    const_buffer(part1, size1),
-    const_buffer(part2, size2)
-);
-
-// Access by index
-const_buffer& first = pair[0];
-const_buffer& second = pair[1];
-
-// Iterate
-for (const_buffer const& buf : pair)
-{
-    process(buf);
-}
+template<typename T>
+concept MutableBufferSequence =
+    std::is_convertible_v<T, mutable_buffer> || (
+        std::ranges::bidirectional_range<T> &&
+        std::is_convertible_v<std::ranges::range_value_t<T>, mutable_buffer>);
 ----
 
-Buffer pairs are especially useful for circular buffers where data
-wraps around the end of storage.
+Same pattern, but for `mutable_buffer`.
 
-== Incremental Consumption with consuming_buffers
+== Satisfying the Concepts
 
-When reading or writing in a loop, you need to track progress through the
-buffer sequence. The `consuming_buffers` wrapper handles this automatically.
+Many common types satisfy these concepts:
 
 [source,cpp]
 ----
-#include <boost/capy/buffers/consuming_buffers.hpp>
+// Single buffers
+const_buffer cb;                    // ConstBufferSequence
+mutable_buffer mb;                  // MutableBufferSequence (and ConstBufferSequence)
 
-template<WriteStream Stream, ConstBufferSequence Buffers>
-task<io_result<std::size_t>> write_all(Stream& stream, Buffers const& bufs)
-{
-    consuming_buffers consuming(bufs);
-    std::size_t total = 0;
+// Standard containers of buffers
+std::vector<const_buffer> v;        // ConstBufferSequence
+std::array<mutable_buffer, 3> a;    // MutableBufferSequence
 
-    while (buffer_size(consuming) > 0)
-    {
-        auto [ec, n] = co_await stream.write_some(consuming);
-        if (ec.failed())
-            co_return {ec, total};
-        consuming.consume(n);
-        total += n;
-    }
-
-    co_return {{}, total};
-}
+// String types (convert to single buffer)
+std::string str;                    // ConstBufferSequence (via make_buffer)
+std::string_view sv;                // ConstBufferSequence
 ----
 
-=== How consuming_buffers Works
+== Heterogeneous Composition
 
-The wrapper tracks the current position within the buffer sequence:
+Because the concept accepts anything convertible to buffer, you can mix types:
 
 [source,cpp]
 ----
-std::array<const_buffer, 3> bufs = {
-    const_buffer(a, 100),
-    const_buffer(b, 200),
-    const_buffer(c, 150)
-};
-
-consuming_buffers cb(bufs);
-// cb iteration yields: [a, 100], [b, 200], [c, 150]
-
-cb.consume(50);
-// cb iteration yields: [a+50, 50], [b, 200], [c, 150]
+template<ConstBufferSequence Buffers>
+void send(Buffers const& bufs);
 
-cb.consume(60);
-// cb iteration yields: [b+10, 190], [c, 150]
+// All of these work:
+send(make_buffer("Hello"));                    // string literal
+send(std::string_view{"Hello"});               // string_view
+send(std::array{buf1, buf2});                  // array of buffers
+send(my_custom_buffer_sequence);               // custom type
 ----
 
-The original buffer sequence is not modified.
-
-== I/O Loop Patterns
-
-Understanding how the composed `read()` and `write()` functions work
-helps you implement custom I/O patterns.
-
-=== Complete Write Pattern
+== Iterating Buffer Sequences
 
-The `write()` function loops until all data is written:
+Use `begin()` and `end()` from `<boost/capy/buffers.hpp>`:
 
 [source,cpp]
 ----
-// Simplified implementation of write()
-auto write(WriteStream auto& stream, ConstBufferSequence auto const& buffers)
-    -> task<io_result<std::size_t>>
+template<ConstBufferSequence Buffers>
+void process(Buffers const& bufs)
 {
-    consuming_buffers consuming(buffers);
-    std::size_t total = 0;
-    std::size_t const goal = buffer_size(buffers);
-
-    while (total < goal)
+    for (auto it = begin(bufs); it != end(bufs); ++it)
     {
-        auto [ec, n] = co_await stream.write_some(consuming);
-        if (ec.failed())
-            co_return {ec, total};
-        consuming.consume(n);
-        total += n;
+        const_buffer buf = *it;
+        // Process buf.data(), buf.size()
     }
-
-    co_return {{}, total};
 }
 ----
 
-=== Complete Read Pattern
+These functions handle both single buffers (returning pointer-to-self) and ranges (returning standard iterators).
+
+== consuming_buffers
 
-The `read()` function fills buffers completely:
+When transferring data incrementally, `consuming_buffers` tracks progress:
 
 [source,cpp]
 ----
-// Simplified implementation of read()
-auto read(ReadStream auto& stream, MutableBufferSequence auto const& buffers)
-    -> task<io_result<std::size_t>>
+#include <boost/capy/buffers/consuming_buffers.hpp>
+
+template<MutableBufferSequence Buffers>
+task<std::size_t> read_all(Stream& stream, Buffers buffers)
 {
-    consuming_buffers consuming(buffers);
+    consuming_buffers<Buffers> remaining(buffers);
     std::size_t total = 0;
-    std::size_t const goal = buffer_size(buffers);
-
-    while (total < goal)
+    
+    while (buffer_size(remaining) > 0)
     {
-        auto [ec, n] = co_await stream.read_some(consuming);
+        auto [ec, n] = co_await stream.read_some(remaining);
         if (ec.failed())
-            co_return {ec, total};
-        consuming.consume(n);
+            break;
+        remaining.consume(n);
         total += n;
     }
-
-    co_return {{}, total};
+    
+    co_return total;
 }
 ----
 
-== Slicing Buffer Sequences
+`consuming_buffers` wraps a buffer sequence and provides:
 
-The `tag_invoke` customization point enables slicing buffers:
+* `consume(n)` — Mark `n` bytes as consumed (remove from front)
+* Iteration over unconsumed buffers
+* `buffer_size()` of remaining bytes
 
-[source,cpp]
-----
-mutable_buffer buf(data, 100);
+== Zero-Allocation Composition
 
-// Remove first 20 bytes
-tag_invoke(slice_tag{}, buf, slice_how::remove_prefix, 20);
-// buf now points to data+20, size 80
+The `cat()` function composes buffer sequences without allocation:
 
-// Keep only first 50 bytes
-mutable_buffer buf2(data, 100);
-tag_invoke(slice_tag{}, buf2, slice_how::keep_prefix, 50);
-// buf2 points to data, size 50
+[source,cpp]
 ----
+auto headers = std::array{header_buf1, header_buf2};
+auto body = body_buffer;
 
-== Common Patterns
+auto combined = cat(headers, body);  // No allocation
 
-=== Scatter Read: Protocol Header + Body
-
-[source,cpp]
+// combined satisfies ConstBufferSequence
+// Iteration yields: header_buf1, header_buf2, body_buffer
 ----
-task<void> read_framed_message(ReadStream auto& stream)
-{
-    // Fixed header followed by variable body
-    char header[8];
-    char body[1024];
 
-    std::array<mutable_buffer, 2> bufs = {
-        mutable_buffer(header, 8),
-        mutable_buffer(body, 1024)
-    };
+The returned object stores references (or small copies for single buffers) and iterates through the composed sequence on demand.
 
-    auto [ec, n] = co_await stream.read_some(bufs);
-    // header filled first, then body
-}
-----
+== Why Bidirectional?
 
-=== Gather Write: Multipart Response
+The concepts require bidirectional ranges (not just forward ranges) for two reasons:
 
-[source,cpp]
-----
-task<void> send_http_response(
-    WriteStream auto& stream,
-    std::string_view status_line,
-    std::string_view headers,
-    std::string_view body)
-{
-    std::array<const_buffer, 3> parts = {
-        make_buffer(status_line),
-        make_buffer(headers),
-        make_buffer(body)
-    };
+1. Some algorithms traverse buffers backwards
+2. `consuming_buffers` needs to adjust the first buffer's start position
 
-    co_await write(stream, parts);
-}
-----
+If your custom buffer sequence only provides forward iteration, wrap it in a type that provides bidirectional access.
 
-== Summary
+== Reference
 
 [cols="1,3"]
 |===
-| Component | Purpose
-
-| `ConstBufferSequence`
-| Concept for sequences of read-only buffers
+| Header | Description
 
-| `MutableBufferSequence`
-| Concept for sequences of writable buffers
+| `<boost/capy/buffers.hpp>`
+| Concepts and iteration functions
 
-| `begin`, `end`
-| Uniform iteration over any buffer sequence
-
-| `const_buffer_pair`
-| Optimized two-element const sequence
-
-| `mutable_buffer_pair`
-| Optimized two-element mutable sequence
-
-| `consuming_buffers`
-| Track progress through a buffer sequence
+| `<boost/capy/buffers/consuming_buffers.hpp>`
+| Incremental consumption wrapper
 |===
 
-== Navigation
-
-[.text-center]
-xref:index.adoc[← Buffer Types] | xref:algorithms.adoc[Next: Buffer Algorithms →]
+You have now learned how buffer sequences enable zero-allocation composition. Continue to xref:system-io.adoc[System I/O Integration] to see how buffer sequences interface with operating system I/O.
diff --git a/doc/modules/ROOT/pages/buffers/system-io.adoc b/doc/modules/ROOT/pages/buffers/system-io.adoc
new file mode 100644
index 00000000..1f6b6374
--- /dev/null
+++ b/doc/modules/ROOT/pages/buffers/system-io.adoc
@@ -0,0 +1,205 @@
+= System I/O Integration
+
+This section explains how buffer sequences interface with operating system I/O operations.
+
+== Prerequisites
+
+* Completed xref:sequences.adoc[Buffer Sequences]
+* Understanding of buffer sequence concepts
+
+== The Virtual Boundary
+
+User-facing APIs use concepts for composition flexibility. But at type-erasure boundaries—where virtual functions are needed—concrete types are required.
+
+Capy's design:
+
+* *User-facing API* — Accepts `ConstBufferSequence` or `MutableBufferSequence` concepts
+* *Internal boundary* — Converts to concrete arrays for virtual dispatch
+* *OS interface* — Translates to platform-specific structures
+
+The library handles all conversions automatically.
+
+== Platform Buffer Structures
+
+=== POSIX: iovec
+
+[source,c]
+----
+struct iovec {
+    void*  iov_base;  // Pointer to data
+    size_t iov_len;   // Length of data
+};
+----
+
+Used with `readv()`, `writev()`, `recvmsg()`, `sendmsg()`.
+
+=== Windows: WSABUF
+
+[source,c]
+----
+typedef struct _WSABUF {
+    ULONG  len;  // Length (note: first!)
+    CHAR*  buf;  // Pointer
+} WSABUF;
+----
+
+Used with `WSARecv()`, `WSASend()`.
+
+Note the different member order—Capy handles this platform difference internally.
+
+== Translation Process
+
+When you call an I/O function with a buffer sequence:
+
+[source,cpp]
+----
+template<ConstBufferSequence Buffers>
+io_result<std::size_t> write_some(Buffers const& buffers);
+----
+
+Internally, Capy:
+
+1. Counts the number of buffers in the sequence
+2. Allocates space for platform buffer structures (on stack for small sequences)
+3. Copies buffer descriptors (pointer/size pairs) to platform structures
+4. Calls the OS function with the platform array
+5. Returns the result
+
+== Stack-Based Conversion
+
+For common cases (small numbers of buffers), conversion happens on the stack:
+
+[source,cpp]
+----
+// Pseudocode of internal implementation
+template<ConstBufferSequence Buffers>
+auto platform_write(Buffers const& buffers)
+{
+    std::size_t count = buffer_length(buffers);
+    
+    if (count <= 8)  // Small buffer optimization
+    {
+        iovec iovecs[8];
+        fill_iovecs(iovecs, buffers, count);
+        return writev(fd, iovecs, count);
+    }
+    else  // Heap fallback
+    {
+        std::vector<iovec> iovecs(count);
+        fill_iovecs(iovecs.data(), buffers, count);
+        return writev(fd, iovecs.data(), count);
+    }
+}
+----
+
+Most real-world code uses fewer than 8 buffers, so heap allocation is rarely needed.
+
+== Scatter/Gather Benefits
+
+Using vectored I/O provides:
+
+=== Fewer System Calls
+
+Without scatter/gather:
+
+[source,cpp]
+----
+write(fd, header, header_len);  // syscall 1
+write(fd, body, body_len);      // syscall 2
+----
+
+With scatter/gather:
+
+[source,cpp]
+----
+iovec iov[2] = {{header, header_len}, {body, body_len}};
+writev(fd, iov, 2);  // single syscall
+----
+
+=== Zero-Copy Transmission
+
+Data doesn't need to be copied into a single contiguous buffer. The OS reads directly from each buffer in sequence.
+
+=== Atomic Operations
+
+The vectored write is atomic at the file offset level—other processes see either none or all of the data.
+
+== Registered Buffers
+
+Advanced platforms offer registered buffer optimizations:
+
+=== io_uring (Linux 5.1+)
+
+Buffers can be pre-registered with the kernel, eliminating per-operation address translation:
+
+[source,cpp]
+----
+// Registration (done once)
+io_uring_register_buffers(ring, buffers, count);
+
+// Use (fast path - no translation)
+io_uring_prep_write_fixed(sqe, fd, buf, len, offset, buf_index);
+----
+
+=== IOCP (Windows)
+
+Similar optimization with pre-registered memory regions for zero-copy I/O.
+
+Capy's Corosio library exposes these optimizations where available.
+
+== Writing Efficient Code
+
+=== Minimize Buffer Count
+
+Fewer buffers means less translation overhead:
+
+[source,cpp]
+----
+// Prefer: single buffer when possible
+auto buf = assemble_message();  // Build in one buffer
+write(stream, buf);
+
+// Avoid: many tiny buffers
+std::array<const_buffer, 100> tiny_bufs;
+write(stream, tiny_bufs);  // 100-element translation
+----
+
+=== Reuse Buffer Structures
+
+For repeated I/O with the same structure, consider caching the platform buffer array:
+
+[source,cpp]
+----
+// Build once, use many times
+struct message_buffers
+{
+    std::array<iovec, 3> iovecs;
+    
+    void set_header(void const* p, std::size_t n);
+    void set_body(void const* p, std::size_t n);
+    void set_footer(void const* p, std::size_t n);
+};
+----
+
+=== Profile Before Optimizing
+
+Buffer translation is rarely the bottleneck. Focus on:
+
+* Network latency
+* Disk I/O time
+* Data processing logic
+
+Not buffer descriptor copying.
+
+== Reference
+
+The buffer sequence concepts and translation utilities are in:
+
+[source,cpp]
+----
+#include <boost/capy/buffers.hpp>
+----
+
+OS-specific I/O is handled by Corosio, which builds on Capy's buffer model.
+
+You have now learned how buffer sequences integrate with operating system I/O. Continue to xref:algorithms.adoc[Buffer Algorithms] to learn about measuring and copying buffers.
diff --git a/doc/modules/ROOT/pages/buffers/types.adoc b/doc/modules/ROOT/pages/buffers/types.adoc
new file mode 100644
index 00000000..83b3246e
--- /dev/null
+++ b/doc/modules/ROOT/pages/buffers/types.adoc
@@ -0,0 +1,200 @@
+= Buffer Types
+
+This section introduces Capy's fundamental buffer types: `const_buffer` and `mutable_buffer`.
+
+== Prerequisites
+
+* Completed xref:overview.adoc[Why Concepts, Not Spans]
+* Understanding of why concept-driven buffers enable composition
+
+== Why Not std::byte?
+
+`std::byte` imposes a semantic opinion. It says "this is raw bytes"—but that is itself an opinion about the data's nature.
+
+POSIX uses `void*` for buffers. This expresses semantic neutrality: "I move memory without opining on what it contains." The OS doesn't care if the bytes represent text, integers, or compressed data—it moves them.
+
+But `std::span<void>` doesn't compile. C++ can't express a type-agnostic buffer abstraction using `span`.
+
+Capy provides `const_buffer` and `mutable_buffer` as semantically neutral buffer types with known layout.
+
+== const_buffer
+
+`const_buffer` represents a contiguous region of read-only memory:
+
+[source,cpp]
+----
+class const_buffer
+{
+public:
+    const_buffer() = default;
+    const_buffer(void const* data, std::size_t size) noexcept;
+    const_buffer(mutable_buffer const& b) noexcept;  // Implicit conversion
+    
+    void const* data() const noexcept;
+    std::size_t size() const noexcept;
+    
+    const_buffer& operator+=(std::size_t n) noexcept;  // Remove prefix
+};
+----
+
+=== Construction
+
+[source,cpp]
+----
+// From pointer and size
+char data[] = "hello";
+const_buffer buf(data, 5);
+
+// From mutable_buffer (implicit)
+mutable_buffer mbuf(data, 5);
+const_buffer cbuf = mbuf;  // OK: mutable -> const
+----
+
+=== Accessors
+
+[source,cpp]
+----
+const_buffer buf(data, 5);
+
+void const* ptr = buf.data();  // Pointer to first byte
+std::size_t len = buf.size();  // Number of bytes
+----
+
+=== Prefix Removal
+
+The `+=` operator removes bytes from the front of the buffer:
+
+[source,cpp]
+----
+const_buffer buf(data, 10);
+
+buf += 3;  // Remove first 3 bytes
+// buf.data() now points 3 bytes later
+// buf.size() is now 7
+----
+
+This is useful when processing a buffer incrementally.
+
+== mutable_buffer
+
+`mutable_buffer` represents a contiguous region of writable memory:
+
+[source,cpp]
+----
+class mutable_buffer
+{
+public:
+    mutable_buffer() = default;
+    mutable_buffer(void* data, std::size_t size) noexcept;
+    
+    void* data() const noexcept;
+    std::size_t size() const noexcept;
+    
+    mutable_buffer& operator+=(std::size_t n) noexcept;
+};
+----
+
+The interface mirrors `const_buffer`, but `data()` returns non-const `void*`.
+
+=== Conversion
+
+`mutable_buffer` implicitly converts to `const_buffer`:
+
+[source,cpp]
+----
+void process(const_buffer buf);
+
+mutable_buffer mbuf(data, size);
+process(mbuf);  // OK: implicit conversion
+----
+
+The reverse is not allowed—you cannot implicitly convert `const_buffer` to `mutable_buffer`.
+
+== make_buffer
+
+The `make_buffer` function creates buffers from various sources:
+
+[source,cpp]
+----
+#include <boost/capy/buffers/make_buffer.hpp>
+
+// From pointer and size
+auto buf = make_buffer(ptr, size);
+
+// From C array
+char arr[10];
+auto buf = make_buffer(arr);
+
+// From std::array
+std::array<char, 10> arr;
+auto buf = make_buffer(arr);
+
+// From std::vector
+std::vector<char> vec(100);
+auto buf = make_buffer(vec);
+
+// From std::string
+std::string str = "hello";
+auto buf = make_buffer(str);
+
+// From std::string_view
+std::string_view sv = "hello";
+auto buf = make_buffer(sv);
+----
+
+The returned buffer type depends on constness:
+
+* Non-const containers → `mutable_buffer`
+* Const containers, `string_view` → `const_buffer`
+
+== Layout Compatibility
+
+`const_buffer` and `mutable_buffer` have the same memory layout as OS buffer structures:
+
+* POSIX: `struct iovec { void* iov_base; size_t iov_len; }`
+* Windows: `struct WSABUF { ULONG len; CHAR* buf; }` (note: different order)
+
+This means conversion to OS structures is efficient—often just a reinterpret_cast for arrays of buffers.
+
+== Single Buffers as Sequences
+
+A single buffer is a degenerate sequence—a sequence with one element. The `ConstBufferSequence` and `MutableBufferSequence` concepts recognize this:
+
+[source,cpp]
+----
+template<ConstBufferSequence Buffers>
+void write_data(Buffers const& buffers);
+
+// All of these work:
+write_data(make_buffer("hello"));         // Single buffer
+write_data(std::array{buf1, buf2, buf3}); // Multiple buffers
+write_data(my_composite);                  // Custom sequence
+----
+
+The library provides `begin()` and `end()` functions that work uniformly:
+
+[source,cpp]
+----
+const_buffer single;
+auto it = begin(single);  // Returns pointer to single
+auto e = end(single);     // Returns pointer past single
+
+std::array<const_buffer, 3> multi;
+auto it = begin(multi);   // Returns multi.begin()
+auto e = end(multi);      // Returns multi.end()
+----
+
+== Reference
+
+[cols="1,3"]
+|===
+| Header | Description
+
+| `<boost/capy/buffers.hpp>`
+| Core buffer types and concepts
+
+| `<boost/capy/buffers/make_buffer.hpp>`
+| Buffer creation utilities
+|===
+
+You have now learned about `const_buffer` and `mutable_buffer`. Continue to xref:sequences.adoc[Buffer Sequences] to understand how these types compose into sequences.
diff --git a/doc/modules/ROOT/pages/concurrency/advanced.adoc b/doc/modules/ROOT/pages/concurrency/advanced.adoc
new file mode 100644
index 00000000..2f346240
--- /dev/null
+++ b/doc/modules/ROOT/pages/concurrency/advanced.adoc
@@ -0,0 +1,241 @@
+= Part III: Advanced Primitives
+
+This section covers advanced synchronization primitives: atomics for lock-free operations, condition variables for efficient waiting, and shared locks for reader/writer patterns.
+
+== Prerequisites
+
+* Completed xref:synchronization.adoc[Part II: Synchronization]
+* Understanding of mutexes, lock guards, and deadlocks
+
+== Atomics: Lock-Free Operations
+
+For operations on individual values, mutexes might be overkill. *Atomic types* provide lock-free thread safety for single variables.
+
+An atomic operation completes entirely before any other thread can observe its effects. There is no intermediate state.
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+#include <atomic>
+
+std::atomic<int> counter{0};
+
+void increment_many_times()
+{
+    for (int i = 0; i < 100000; ++i)
+        ++counter;  // atomic increment
+}
+
+int main()
+{
+    std::thread t1(increment_many_times);
+    std::thread t2(increment_many_times);
+    
+    t1.join();
+    t2.join();
+    
+    std::cout << "Counter: " << counter << "\n";
+    return 0;
+}
+----
+
+No mutex, no lock guard, yet the result is always 200,000. The `std::atomic<int>` ensures that increments are indivisible.
+
+=== When to Use Atomics
+
+Atomics work best for single-variable operations: counters, flags, simple state. They are faster than mutexes when contention is low. But they cannot protect complex operations involving multiple variables—for that, you need mutexes.
+
+Common atomic types include:
+
+* `std::atomic<bool>` — Thread-safe boolean flag
+* `std::atomic<int>` — Thread-safe integer counter
+* `std::atomic<T*>` — Thread-safe pointer
+* `std::atomic<std::shared_ptr<T>>` — Thread-safe shared pointer (C++20)
+
+Any trivially copyable type can be made atomic.
+
+=== Atomic Operations
+
+[source,cpp]
+----
+std::atomic<int> value{0};
+
+value.store(42);              // atomic write
+int x = value.load();         // atomic read
+int old = value.exchange(10); // atomic read-modify-write
+value.fetch_add(5);           // atomic addition, returns old value
+value.fetch_sub(3);           // atomic subtraction, returns old value
+
+// Compare-and-swap (CAS)
+int expected = 10;
+bool success = value.compare_exchange_strong(expected, 20);
+// If value == expected, sets value = 20 and returns true
+// Otherwise, sets expected = value and returns false
+----
+
+== Condition Variables: Efficient Waiting
+
+Sometimes a thread must wait for a specific condition before proceeding. You could loop, repeatedly checking:
+
+[source,cpp]
+----
+// Inefficient busy-wait
+while (!ready)
+{
+    std::this_thread::sleep_for(std::chrono::milliseconds(100));
+}
+----
+
+This works but wastes CPU cycles and introduces latency. *Condition variables* provide efficient waiting.
+
+A condition variable allows one thread to signal others that something has changed. Waiting threads sleep until notified, consuming no CPU.
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+#include <mutex>
+#include <condition_variable>
+
+std::mutex mtx;
+std::condition_variable cv;
+bool ready = false;
+
+void worker()
+{
+    std::unique_lock<std::mutex> lock(mtx);
+    cv.wait(lock, []{ return ready; });  // wait until ready is true
+    std::cout << "Worker proceeding!\n";
+}
+
+void signal_ready()
+{
+    {
+        std::lock_guard<std::mutex> lock(mtx);
+        ready = true;
+    }
+    cv.notify_one();  // wake one waiting thread
+}
+
+int main()
+{
+    std::thread t(worker);
+    
+    std::this_thread::sleep_for(std::chrono::seconds(1));
+    signal_ready();
+    
+    t.join();
+    return 0;
+}
+----
+
+The worker thread calls `cv.wait()`, which atomically releases the mutex and suspends the thread. When `signal_ready()` calls `notify_one()`, the worker wakes up, reacquires the mutex, checks the condition, and proceeds.
+
+=== The Predicate
+
+The lambda `[]{ return ready; }` is the *predicate*. `wait()` will not return until this evaluates to true. This guards against *spurious wakeups*—rare events where a thread wakes without notification. Always use a predicate.
+
+=== Notification Methods
+
+* `notify_one()` — Wake a single waiting thread
+* `notify_all()` — Wake all waiting threads
+
+Use `notify_one()` when only one thread needs to proceed (e.g., producer-consumer with single consumer). Use `notify_all()` when multiple threads might need to check the condition (e.g., broadcast events, shutdown signals).
+
+=== Wait Variants
+
+[source,cpp]
+----
+// Wait indefinitely
+cv.wait(lock, predicate);
+
+// Wait with timeout
+auto status = cv.wait_for(lock, std::chrono::seconds(5), predicate);
+// Returns true if predicate is true, false on timeout
+
+// Wait until specific time point
+auto status = cv.wait_until(lock, deadline, predicate);
+----
+
+== Shared Locks: Readers and Writers
+
+Consider a data structure that is read frequently but written rarely. A regular mutex serializes all access—but why block readers from each other? Multiple threads can safely read simultaneously; only writes require exclusive access.
+
+*Shared mutexes* support this pattern:
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+#include <shared_mutex>
+#include <vector>
+
+std::shared_mutex rw_mutex;
+std::vector<int> data;
+
+void reader(int id)
+{
+    std::shared_lock<std::shared_mutex> lock(rw_mutex);  // shared access
+    std::cout << "Reader " << id << " sees " << data.size() << " elements\n";
+}
+
+void writer(int value)
+{
+    std::unique_lock<std::shared_mutex> lock(rw_mutex);  // exclusive access
+    data.push_back(value);
+    std::cout << "Writer added " << value << "\n";
+}
+----
+
+=== Lock Types
+
+`std::shared_lock`::
+Acquires a *shared lock*—multiple threads can hold shared locks simultaneously.
+
+`std::unique_lock` (on shared_mutex)::
+Acquires an *exclusive lock*—no other locks (shared or exclusive) can be held.
+
+=== Behavior
+
+* While any reader holds a shared lock, writers must wait
+* While a writer holds an exclusive lock, everyone waits
+* Multiple readers can proceed simultaneously
+
+This pattern maximizes concurrency for read-heavy workloads. Use `std::shared_mutex` when reads vastly outnumber writes.
+
+=== Example: Thread-Safe Cache
+
+[source,cpp]
+----
+#include <shared_mutex>
+#include <unordered_map>
+#include <string>
+#include <optional>
+
+class ThreadSafeCache
+{
+    std::unordered_map<std::string, std::string> cache_;
+    mutable std::shared_mutex mutex_;
+    
+public:
+    std::optional<std::string> get(std::string const& key) const
+    {
+        std::shared_lock lock(mutex_);  // readers can proceed in parallel
+        auto it = cache_.find(key);
+        if (it != cache_.end())
+            return it->second;
+        return std::nullopt;
+    }
+    
+    void put(std::string const& key, std::string const& value)
+    {
+        std::unique_lock lock(mutex_);  // exclusive access for writing
+        cache_[key] = value;
+    }
+};
+----
+
+Multiple threads can call `get()` simultaneously without blocking each other. Only `put()` requires exclusive access.
+
+You have now learned about atomics, condition variables, and shared locks. In the next section, you will explore communication patterns: futures, promises, async, and practical concurrent patterns.
diff --git a/doc/modules/ROOT/pages/concurrency/foundations.adoc b/doc/modules/ROOT/pages/concurrency/foundations.adoc
new file mode 100644
index 00000000..e9d2bc4f
--- /dev/null
+++ b/doc/modules/ROOT/pages/concurrency/foundations.adoc
@@ -0,0 +1,235 @@
+= Part I: Foundations
+
+This section introduces the fundamental concepts of concurrent programming. You will learn what concurrency is, why it matters, and how threads provide the foundation for parallel execution.
+
+== Prerequisites
+
+Before beginning this tutorial, you should have:
+
+* A C++ compiler with C++11 or later support
+* Familiarity with basic C++ concepts: functions, classes, and lambdas
+* Understanding of how programs execute sequentially
+
+== Why Concurrency Matters
+
+Modern computers have multiple processor cores. A quad-core laptop can do four things at once. But most programs use only one core, leaving the others idle. Concurrency lets you use all your processing power.
+
+Consider downloading a large file. Without concurrency, your application freezes—the user interface becomes unresponsive because your single thread of execution is busy waiting for network data. With concurrency, one thread handles the download while another keeps the interface responsive. The user can continue working, cancel the download, or start another—all while data streams in.
+
+The benefits compound in computationally intensive work. Image processing, scientific simulations, video encoding—these tasks can be split into independent pieces. Process them simultaneously and your program finishes in a fraction of the time.
+
+But concurrency is not free. It introduces complexity. Multiple threads accessing the same data can corrupt it. Threads waiting on each other can freeze forever. These problems—*race conditions* and *deadlocks*—are the challenges you will learn to handle.
+
+== Threads—Your Program's Parallel Lives
+
+When you run a program, the operating system creates a *process* for it. This process gets its own memory space, its own resources, and at least one *thread of execution*—the main thread.
+
+Think of a thread as a bookmark in a book of instructions. It marks where you are in the code. The processor reads the instruction at that bookmark, executes it, and moves the bookmark forward. One thread means one bookmark—your program can only be at one place in the code at a time.
+
+But you can create additional threads. Each thread is its own bookmark, tracking its own position in the code. Now your program can be at multiple places simultaneously. Each thread has its own *call stack*—its own record of which functions called which—but all threads share the same *heap memory*.
+
+This sharing is both the power and the peril of threads.
+
+== Creating Threads
+
+The `<thread>` header provides `std::thread`, the standard way to create threads in C++.
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+
+void say_hello()
+{
+    std::cout << "Hello from a new thread!\n";
+}
+
+int main()
+{
+    std::thread t(say_hello);
+    t.join();
+    std::cout << "Back in the main thread.\n";
+    return 0;
+}
+----
+
+The `std::thread` constructor takes a function (or any *callable*) and immediately starts a new thread running that function. Two bookmarks now move through your code simultaneously.
+
+The `join()` call makes the main thread wait until thread `t` finishes. Without it, `main()` might return and terminate the program before `say_hello()` completes. Always join your threads before they go out of scope.
+
+=== Parallel Execution
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+
+void count_up(char const* name)
+{
+    for (int i = 1; i <= 5; ++i)
+        std::cout << name << ": " << i << "\n";
+}
+
+int main()
+{
+    std::thread alice(count_up, "Alice");
+    std::thread bob(count_up, "Bob");
+    
+    alice.join();
+    bob.join();
+    
+    return 0;
+}
+----
+
+Run this and you might see output like:
+
+----
+Alice: 1
+Bob: 1
+Alice: 2
+Bob: 2
+Alice: 3
+...
+----
+
+Or perhaps:
+
+----
+AliceBob: : 1
+1
+Alice: 2
+...
+----
+
+The interleaving varies each run. Both threads race to print, and their outputs jumble together. This unpredictability is your first glimpse of concurrent programming's fundamental challenge: when threads share resources (here, `std::cout`), chaos can ensue.
+
+== Ways to Create Threads
+
+Threads accept any callable object: functions, lambda expressions, function objects (functors), and member functions.
+
+=== Lambda Expressions
+
+Lambda expressions are often the clearest choice:
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+
+int main()
+{
+    int x = 42;
+    
+    std::thread t([x]() {
+        std::cout << "The value is: " << x << "\n";
+    });
+    
+    t.join();
+    return 0;
+}
+----
+
+The lambda captures `x` by value—it copies `x` into the lambda. By default, `std::thread` copies all arguments passed to it. Even if your function declares a reference parameter, the thread receives a copy.
+
+To pass by reference, use `std::ref()`:
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+
+void increment(int& value)
+{
+    ++value;
+}
+
+int main()
+{
+    int counter = 0;
+    
+    std::thread t(increment, std::ref(counter));
+    t.join();
+    
+    std::cout << "Counter is now: " << counter << "\n";
+    return 0;
+}
+----
+
+Without `std::ref()`, the thread would modify a copy, leaving `counter` unchanged.
+
+=== Member Functions
+
+For member functions, pass a pointer to the function and an instance:
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+#include <string>
+
+class Greeter
+{
+public:
+    void greet(std::string const& name)
+    {
+        std::cout << "Hello, " << name << "!\n";
+    }
+};
+
+int main()
+{
+    Greeter g;
+    std::thread t(&Greeter::greet, &g, "World");
+    t.join();
+    return 0;
+}
+----
+
+The `&Greeter::greet` syntax names the member function; `&g` provides the instance to call it on.
+
+== Thread Lifecycle: Join, Detach, and Destruction
+
+Every thread must be either *joined* or *detached* before its `std::thread` object is destroyed. Failing to do so calls `std::terminate()`, abruptly ending your program.
+
+=== join()
+
+`join()` blocks the calling thread until the target thread finishes. This is how you wait for work to complete:
+
+[source,cpp]
+----
+std::thread t(do_work);
+// ... do other things ...
+t.join();  // wait for do_work to finish
+----
+
+=== detach()
+
+Sometimes you want a thread to run independently, continuing even after the `std::thread` object is destroyed. That is what `detach()` does:
+
+[source,cpp]
+----
+std::thread t(background_task);
+t.detach();  // thread runs independently
+// t is now "empty"—no longer associated with a thread
+----
+
+A detached thread becomes a *daemon thread*. It runs until it finishes or the program exits. You lose all ability to wait for it or check its status. Use detachment sparingly—usually for fire-and-forget background work.
+
+=== Checking joinable()
+
+Before joining or detaching, you can check if a thread is *joinable*:
+
+[source,cpp]
+----
+std::thread t(some_function);
+
+if (t.joinable())
+{
+    t.join();
+}
+----
+
+A thread is joinable if it represents an actual thread of execution. After joining or detaching, or after default construction, a `std::thread` is not joinable.
+
+You have now learned the basics of threads: creation, execution, and lifecycle management. In the next section, you will learn about the dangers of shared data and how to protect it with synchronization primitives.
diff --git a/doc/modules/ROOT/pages/concurrency/patterns.adoc b/doc/modules/ROOT/pages/concurrency/patterns.adoc
new file mode 100644
index 00000000..b4e63c45
--- /dev/null
+++ b/doc/modules/ROOT/pages/concurrency/patterns.adoc
@@ -0,0 +1,293 @@
+= Part IV: Communication & Patterns
+
+This section covers communication mechanisms for getting results from threads and practical patterns for concurrent programming.
+
+== Prerequisites
+
+* Completed xref:advanced.adoc[Part III: Advanced Primitives]
+* Understanding of atomics, condition variables, and shared locks
+
+== Futures and Promises: Getting Results Back
+
+Threads can perform work, but how do you get results from them? Passing references works but is clunky. C++ offers a cleaner abstraction: *futures* and *promises*.
+
+A `std::promise` is a write-once container: a thread can set its value. A `std::future` is the corresponding read-once container: another thread can get that value. They form a one-way communication channel.
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+#include <future>
+
+void compute(std::promise<int> result_promise)
+{
+    int answer = 6 * 7;  // expensive computation
+    result_promise.set_value(answer);
+}
+
+int main()
+{
+    std::promise<int> promise;
+    std::future<int> future = promise.get_future();
+    
+    std::thread t(compute, std::move(promise));
+    
+    std::cout << "Waiting for result...\n";
+    int result = future.get();  // blocks until value is set
+    std::cout << "The answer is: " << result << "\n";
+    
+    t.join();
+    return 0;
+}
+----
+
+The worker thread calls `set_value()`. The main thread calls `get()`, which blocks until the value is available.
+
+=== Important Behaviors
+
+* A future's `get()` can only be called once
+* For multiple consumers, use `std::shared_future`
+* If the promise is destroyed without setting a value, `get()` throws `std::future_error`
+* `set_exception()` allows the worker to signal an error
+
+== std::async: The Easy Path
+
+Creating threads manually, managing promises, joining at the end—it is mechanical. `std::async` automates it:
+
+[source,cpp]
+----
+#include <iostream>
+#include <future>
+
+int compute()
+{
+    return 6 * 7;
+}
+
+int main()
+{
+    std::future<int> future = std::async(compute);
+    
+    std::cout << "Computing...\n";
+    int result = future.get();
+    std::cout << "Result: " << result << "\n";
+    
+    return 0;
+}
+----
+
+`std::async` launches the function (potentially in a new thread), returning a future. No explicit thread creation, no promise management, no join call.
+
+=== Launch Policies
+
+By default, the system decides whether to run the function in a new thread or defer it until you call `get()`. You can specify:
+
+[source,cpp]
+----
+// Force a new thread
+auto future = std::async(std::launch::async, compute);
+
+// Defer execution until get()
+auto future = std::async(std::launch::deferred, compute);
+
+// Let the system decide (default)
+auto future = std::async(std::launch::async | std::launch::deferred, compute);
+----
+
+For quick parallel tasks, `std::async` is often the cleanest choice.
+
+== Thread-Local Storage
+
+Sometimes each thread needs its own copy of a variable—not shared, not copied each call, but persistent within that thread.
+
+Declare it `thread_local`:
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+
+thread_local int counter = 0;
+
+void increment_and_print(char const* name)
+{
+    ++counter;
+    std::cout << name << " counter: " << counter << "\n";
+}
+
+int main()
+{
+    std::thread t1([]{
+        increment_and_print("T1");
+        increment_and_print("T1");
+    });
+    
+    std::thread t2([]{
+        increment_and_print("T2");
+        increment_and_print("T2");
+    });
+    
+    t1.join();
+    t2.join();
+    
+    return 0;
+}
+----
+
+Each thread sees its own `counter`. T1 prints 1, then 2. T2 independently prints 1, then 2. No synchronization needed because the data is not shared.
+
+Thread-local storage is useful for per-thread caches, random number generators, or error state.
+
+== Practical Patterns
+
+=== Producer-Consumer Queue
+
+One or more threads produce work items; one or more threads consume them. A queue connects them:
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+#include <mutex>
+#include <condition_variable>
+#include <queue>
+
+template<typename T>
+class ThreadSafeQueue
+{
+    std::queue<T> queue_;
+    std::mutex mutex_;
+    std::condition_variable cv_;
+    
+public:
+    void push(T value)
+    {
+        {
+            std::lock_guard<std::mutex> lock(mutex_);
+            queue_.push(std::move(value));
+        }
+        cv_.notify_one();
+    }
+    
+    T pop()
+    {
+        std::unique_lock<std::mutex> lock(mutex_);
+        cv_.wait(lock, [this]{ return !queue_.empty(); });
+        T value = std::move(queue_.front());
+        queue_.pop();
+        return value;
+    }
+};
+----
+
+The producer pushes items; the consumer waits for items and processes them. The condition variable ensures the consumer sleeps efficiently when the queue is empty.
+
+[source,cpp]
+----
+ThreadSafeQueue<int> work_queue;
+
+void producer()
+{
+    for (int i = 0; i < 10; ++i)
+    {
+        work_queue.push(i);
+        std::cout << "Produced: " << i << "\n";
+    }
+}
+
+void consumer()
+{
+    for (int i = 0; i < 10; ++i)
+    {
+        int item = work_queue.pop();
+        std::cout << "Consumed: " << item << "\n";
+    }
+}
+
+int main()
+{
+    std::thread prod(producer);
+    std::thread cons(consumer);
+    
+    prod.join();
+    cons.join();
+    
+    return 0;
+}
+----
+
+=== Parallel For
+
+Split a loop across multiple threads:
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+#include <vector>
+#include <functional>
+
+void parallel_for(int start, int end, int num_threads,
+                  std::function<void(int)> func)
+{
+    std::vector<std::thread> threads;
+    int chunk_size = (end - start) / num_threads;
+    
+    for (int t = 0; t < num_threads; ++t)
+    {
+        int chunk_start = start + t * chunk_size;
+        int chunk_end = (t == num_threads - 1) ? end : chunk_start + chunk_size;
+        
+        threads.emplace_back([=]{
+            for (int i = chunk_start; i < chunk_end; ++i)
+                func(i);
+        });
+    }
+    
+    for (auto& thread : threads)
+        thread.join();
+}
+
+int main()
+{
+    std::mutex print_mutex;
+    
+    parallel_for(0, 20, 4, [&](int i){
+        std::lock_guard<std::mutex> lock(print_mutex);
+        std::cout << "Processing " << i << " on thread "
+                  << std::this_thread::get_id() << "\n";
+    });
+    
+    return 0;
+}
+----
+
+The work is divided into chunks, each handled by its own thread. For CPU-bound work on large datasets, this can dramatically reduce execution time.
+
+== Summary
+
+You have learned the fundamentals of concurrent programming:
+
+* *Threads* — Independent flows of execution within a process
+* *Mutexes* — Mutual exclusion to prevent data races
+* *Lock guards* — RAII wrappers that ensure mutexes are properly released
+* *Atomics* — Lock-free safety for single operations
+* *Condition variables* — Efficient waiting for events
+* *Shared locks* — Multiple readers or one writer
+* *Futures and promises* — Communication of results between threads
+* *std::async* — Simplified launching of parallel work
+
+You have seen the dangers—race conditions, deadlocks—and the tools to avoid them.
+
+=== Best Practices
+
+* *Start with std::async* when possible
+* *Prefer immutable data* — shared data that never changes needs no synchronization
+* *Protect mutable shared state carefully* — minimize the data that is shared
+* *Minimize lock duration* — hold locks for as brief a time as possible
+* *Avoid nested locks* — when unavoidable, use `std::scoped_lock`
+* *Test thoroughly* — test with many threads, on different machines, under load
+
+Concurrency is challenging. Bugs hide until the worst moment. Testing is hard because timing varies. But the rewards are substantial: responsive applications, full hardware utilization, and elegant solutions to naturally parallel problems.
+
+This foundation prepares you for understanding Capy's concurrency facilities: `thread_pool`, `strand`, `when_all`, and `async_event`. These build on standard primitives to provide coroutine-friendly concurrent programming.
diff --git a/doc/modules/ROOT/pages/concurrency/synchronization.adoc b/doc/modules/ROOT/pages/concurrency/synchronization.adoc
new file mode 100644
index 00000000..d2c81b72
--- /dev/null
+++ b/doc/modules/ROOT/pages/concurrency/synchronization.adoc
@@ -0,0 +1,202 @@
+= Part II: Synchronization
+
+This section introduces the dangers of shared data access and the synchronization primitives that protect against them. You will learn about race conditions, mutexes, lock guards, and deadlocks.
+
+== Prerequisites
+
+* Completed xref:foundations.adoc[Part I: Foundations]
+* Understanding of threads and their lifecycle
+
+== The Danger: Race Conditions
+
+When multiple threads read the same data, all is well. But when at least one thread writes while others read or write, you have a *data race*. The result is undefined behavior—crashes, corruption, or silent errors.
+
+Consider this code:
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+
+int counter = 0;
+
+void increment_many_times()
+{
+    for (int i = 0; i < 100000; ++i)
+        ++counter;
+}
+
+int main()
+{
+    std::thread t1(increment_many_times);
+    std::thread t2(increment_many_times);
+    
+    t1.join();
+    t2.join();
+    
+    std::cout << "Counter: " << counter << "\n";
+    return 0;
+}
+----
+
+Two threads, each incrementing 100,000 times. You would expect 200,000. But run this repeatedly and you will see different results—180,000, 195,327, maybe occasionally 200,000. Something is wrong.
+
+The `++counter` operation looks atomic—indivisible—but it is not. It actually consists of three steps:
+
+1. Read the current value
+2. Add one
+3. Write the result back
+
+Between any of these steps, the other thread might execute its own steps. Imagine both threads read `counter` when it is 5. Both add one, getting 6. Both write 6 back. Two increments, but the counter only went up by one. This is a *lost update*, a classic race condition.
+
+The more threads, the more opportunity for races. The faster your processor, the more instructions execute between context switches, potentially hiding the bug—until one critical day in production.
+
+== Mutual Exclusion: Mutexes
+
+The solution to data races is *mutual exclusion*: ensuring that only one thread accesses shared data at a time.
+
+A *mutex* (mutual exclusion object) is a lockable resource. Before accessing shared data, a thread *locks* the mutex. If another thread already holds the lock, the requesting thread blocks until the lock is released. This serializes access to the protected data.
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+#include <mutex>
+
+int counter = 0;
+std::mutex counter_mutex;
+
+void increment_many_times()
+{
+    for (int i = 0; i < 100000; ++i)
+    {
+        counter_mutex.lock();
+        ++counter;
+        counter_mutex.unlock();
+    }
+}
+
+int main()
+{
+    std::thread t1(increment_many_times);
+    std::thread t2(increment_many_times);
+    
+    t1.join();
+    t2.join();
+    
+    std::cout << "Counter: " << counter << "\n";
+    return 0;
+}
+----
+
+Now the output is always 200,000. The mutex ensures that between `lock()` and `unlock()`, only one thread executes. The increment is now effectively atomic.
+
+But there is a problem with calling `lock()` and `unlock()` directly. If code between them throws an exception, `unlock()` never executes. The mutex stays locked forever, and any thread waiting for it blocks eternally—a *deadlock*.
+
+== Lock Guards: Safety Through RAII
+
+C++ has a powerful idiom: *RAII* (Resource Acquisition Is Initialization). The idea: acquire resources in a constructor, release them in the destructor. Since destructors run even when exceptions are thrown, cleanup is guaranteed.
+
+Lock guards apply RAII to mutexes:
+
+[source,cpp]
+----
+#include <iostream>
+#include <thread>
+#include <mutex>
+
+int counter = 0;
+std::mutex counter_mutex;
+
+void increment_many_times()
+{
+    for (int i = 0; i < 100000; ++i)
+    {
+        std::lock_guard<std::mutex> lock(counter_mutex);
+        ++counter;
+        // lock is automatically released when it goes out of scope
+    }
+}
+----
+
+The `std::lock_guard` locks the mutex on construction and unlocks it on destruction. Even if an exception is thrown, the destructor runs and the mutex is released. This is the correct way to use mutexes.
+
+=== std::scoped_lock (C++17)
+
+Since C++17, `std::scoped_lock` is preferred. It works like `lock_guard` but can lock multiple mutexes simultaneously, avoiding a class of deadlock:
+
+[source,cpp]
+----
+std::scoped_lock lock(counter_mutex);  // C++17
+----
+
+=== std::unique_lock
+
+For more control, use `std::unique_lock`. It can be unlocked before destruction, moved to another scope, or created without immediately locking:
+
+[source,cpp]
+----
+std::unique_lock<std::mutex> lock(some_mutex, std::defer_lock);
+// mutex not yet locked
+
+lock.lock();  // lock when ready
+// ... do work ...
+lock.unlock();  // unlock early if needed
+// ... do other work ...
+// destructor unlocks again if still locked
+----
+
+`std::unique_lock` is more flexible but slightly more expensive than `std::lock_guard`. Use the simplest tool that does the job.
+
+== The Deadlock Dragon
+
+Mutexes solve data races but introduce a new danger: *deadlock*.
+
+Imagine two threads and two mutexes. Thread A locks mutex 1, then tries to lock mutex 2. Thread B locks mutex 2, then tries to lock mutex 1. Each thread holds one mutex and waits for the other. Neither can proceed. The program freezes.
+
+[source,cpp]
+----
+std::mutex mutex1, mutex2;
+
+void thread_a()
+{
+    std::lock_guard<std::mutex> lock1(mutex1);
+    std::lock_guard<std::mutex> lock2(mutex2);  // blocks, waiting for B
+    // ...
+}
+
+void thread_b()
+{
+    std::lock_guard<std::mutex> lock2(mutex2);
+    std::lock_guard<std::mutex> lock1(mutex1);  // blocks, waiting for A
+    // ...
+}
+----
+
+If both threads run and each acquires its first mutex before the other acquires the second, deadlock occurs.
+
+=== Preventing Deadlock
+
+The simplest prevention: *always lock mutexes in the same order*. If every thread locks `mutex1` before `mutex2`, no cycle can form.
+
+When you need to lock multiple mutexes and cannot guarantee order, use `std::scoped_lock`:
+
+[source,cpp]
+----
+void safe_function()
+{
+    std::scoped_lock lock(mutex1, mutex2);  // locks both atomically
+    // ...
+}
+----
+
+`std::scoped_lock` uses a deadlock-avoidance algorithm internally, acquiring both mutexes without risk of circular waiting.
+
+=== Deadlock Prevention Rules
+
+1. *Lock in consistent order* — Define a global ordering for mutexes and always lock in that order
+2. *Use std::scoped_lock for multiple mutexes* — Let the library handle deadlock avoidance
+3. *Hold locks for minimal time* — Reduce the window for contention
+4. *Avoid nested locks when possible* — Simpler designs prevent deadlock by construction
+
+You have now learned about race conditions, mutexes, lock guards, and deadlocks. In the next section, you will explore advanced synchronization primitives: atomics, condition variables, and shared locks.
diff --git a/doc/modules/ROOT/pages/coroutines/allocators.adoc b/doc/modules/ROOT/pages/coroutines/allocators.adoc
new file mode 100644
index 00000000..1244cd2e
--- /dev/null
+++ b/doc/modules/ROOT/pages/coroutines/allocators.adoc
@@ -0,0 +1,182 @@
+= Frame Allocators
+
+This section explains how coroutine frames are allocated and how to customize allocation for performance.
+
+== Prerequisites
+
+* Completed xref:composition.adoc[Concurrent Composition]
+* Understanding of coroutine frame allocation from xref:../cpp20-coroutines/advanced.adoc[C++20 Coroutines Tutorial]
+
+== The Timing Constraint
+
+Coroutine frame allocation has a unique constraint: memory must be allocated *before* the coroutine body begins executing. The standard C++ mechanism—promise type's `operator new`—is called before the promise is constructed.
+
+This creates a challenge: how can a coroutine use a custom allocator when the allocator might be passed as a parameter, which is stored *in* the frame?
+
+== Thread-Local Propagation
+
+Capy solves this with thread-local propagation:
+
+1. Before evaluating the task argument, `run_async` sets a thread-local allocator
+2. The task's `operator new` reads this thread-local allocator
+3. The task stores the allocator in its promise for child propagation
+
+This is why `run_async` uses two-call syntax:
+
+[source,cpp]
+----
+run_async(executor)(my_task());
+//        ↑         ↑
+//        1. Sets    2. Task allocated
+//        TLS        using TLS allocator
+----
+
+== The Window
+
+The "window" is the interval between setting the thread-local allocator and the coroutine's first suspension point. During this window:
+
+* The task is allocated using the TLS allocator
+* The task captures the TLS allocator in its promise
+* Child tasks inherit the allocator
+
+After the window closes (at the first suspension), the TLS allocator may be restored to a previous value. The task retains its captured allocator regardless.
+
+== The FrameAllocator Concept
+
+Custom allocators must satisfy the `FrameAllocator` concept, which is compatible with C++ allocator requirements:
+
+[source,cpp]
+----
+template<typename A>
+concept FrameAllocator = requires {
+    typename A::value_type;
+} && requires(A& a, std::size_t n) {
+    { a.allocate(n) } -> std::same_as<typename A::value_type*>;
+    { a.deallocate(std::declval<typename A::value_type*>(), n) };
+};
+----
+
+In practice, any standard allocator works.
+
+== Using Custom Allocators
+
+=== With run_async
+
+Pass an allocator to `run_async`:
+
+[source,cpp]
+----
+std::pmr::monotonic_buffer_resource resource;
+std::pmr::polymorphic_allocator<std::byte> alloc(&resource);
+
+run_async(executor, alloc)(my_task());
+----
+
+Or pass a `memory_resource*` directly:
+
+[source,cpp]
+----
+std::pmr::monotonic_buffer_resource resource;
+run_async(executor, &resource)(my_task());
+----
+
+=== Default Allocator
+
+When no allocator is specified, `run_async` uses the execution context's default frame allocator, typically a recycling allocator optimized for coroutine frame sizes.
+
+== Recycling Allocator
+
+Capy provides `recycling_memory_resource`, a memory resource optimized for coroutine frames:
+
+* Maintains freelists by size class
+* Reuses recently freed blocks (cache-friendly)
+* Falls back to upstream allocator for new sizes
+
+This allocator is used by default for `thread_pool` and other execution contexts.
+
+== HALO Optimization
+
+*Heap Allocation eLision Optimization* (HALO) allows the compiler to allocate coroutine frames on the stack instead of the heap when:
+
+* The coroutine's lifetime is provably contained in the caller's
+* The frame size is known at compile time
+* Optimization is enabled
+
+Capy's `task<T>` uses the `[[clang::coro_await_elidable]]` attribute (when available) to enable HALO:
+
+[source,cpp]
+----
+template<typename T = void>
+struct [[nodiscard]] BOOST_CAPY_CORO_AWAIT_ELIDABLE
+    task
+{
+    // ...
+};
+----
+
+=== When HALO Applies
+
+HALO is most effective for immediately-awaited tasks:
+
+[source,cpp]
+----
+// HALO can apply: task is awaited immediately
+int result = co_await compute();
+
+// HALO cannot apply: task escapes to storage
+auto t = compute();
+tasks.push_back(std::move(t));
+----
+
+=== Measuring HALO Effectiveness
+
+Profile your application to see if HALO is taking effect. Look for:
+
+* Reduced heap allocations
+* Improved cache locality
+* Lower allocation latency
+
+== Best Practices
+
+=== Use Default Allocators
+
+For most applications, the default recycling allocator provides good performance without configuration.
+
+=== Consider Memory Resources for Batched Work
+
+When launching many short-lived tasks together, a monotonic buffer resource can be efficient:
+
+[source,cpp]
+----
+void process_batch(std::vector<item> const& items)
+{
+    std::array<std::byte, 64 * 1024> buffer;
+    std::pmr::monotonic_buffer_resource resource(
+        buffer.data(), buffer.size());
+    
+    for (auto const& item : items)
+    {
+        run_async(executor, &resource)(process(item));
+    }
+    // All frames deallocated when resource goes out of scope
+}
+----
+
+=== Profile Before Optimizing
+
+Coroutine frame allocation is rarely the bottleneck. Profile your application before investing in custom allocators.
+
+== Reference
+
+[cols="1,3"]
+|===
+| Header | Description
+
+| `<boost/capy/ex/frame_allocator.hpp>`
+| Frame allocator concept and utilities
+
+| `<boost/capy/ex/recycling_memory_resource.hpp>`
+| Default recycling allocator implementation
+|===
+
+You have now learned how coroutine frame allocation works and how to customize it. This completes the Coroutines in Capy section. Continue to xref:../buffers/overview.adoc[Buffer Sequences] to learn about Capy's buffer model.
diff --git a/doc/modules/ROOT/pages/coroutines/cancellation.adoc b/doc/modules/ROOT/pages/coroutines/cancellation.adoc
index f0446634..8d8b3c86 100644
--- a/doc/modules/ROOT/pages/coroutines/cancellation.adoc
+++ b/doc/modules/ROOT/pages/coroutines/cancellation.adoc
@@ -1,254 +1,391 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
+= Stop Tokens and Cancellation
 
-= Cancellation
+This section teaches cooperative cancellation from the ground up, explaining C++20 stop tokens as a general-purpose notification mechanism and how Capy uses them for coroutine cancellation.
 
-This page explains how to cancel running coroutines using `std::stop_token`.
+== Prerequisites
 
-NOTE: Code snippets assume `using namespace boost::capy;` is in effect.
+* Completed xref:io-awaitable.adoc[The IoAwaitable Protocol]
+* Understanding of how context propagates through coroutine chains
 
-== Cooperative Cancellation
+== Part 1: The Problem
 
-Capy supports cooperative cancellation through `std::stop_token`. When a task
-is launched with stop support, the token propagates through the entire call
-chain automatically.
+Cancellation matters in many scenarios:
 
-Cooperative means:
+* A user clicks "Cancel" on a download dialog
+* A timeout expires while waiting for a network response
+* A connection drops unexpectedly
+* An application is shutting down
 
-* The framework delivers cancellation requests to operations
-* Operations check the token and decide how to respond
-* Nothing is forcibly terminated
+=== The Naive Approach: Boolean Flags
 
-== How Stop Tokens Propagate
-
-Stop tokens propagate through `co_await` chains just like affinity. When you
-await a stoppable operation inside a task with a stop token, the token is
-forwarded automatically:
+The obvious solution seems to be a boolean flag:
 
 [source,cpp]
 ----
-task<void> cancellable_work()
+std::atomic<bool> should_cancel{false};
+
+void worker()
 {
-    // If this task has a stop token, it's automatically
-    // passed to any stoppable awaitables we co_await
-    co_await some_stoppable_operation();
+    while (!should_cancel)
+    {
+        do_work();
+    }
 }
 ----
 
-== The Stoppable Awaitable Protocol
+This approach has problems:
+
+* *No standardization* — Every component invents its own cancellation flag
+* *Race conditions* — Checking the flag and acting on it is not atomic
+* *No cleanup notification* — The worker just stops; no opportunity for graceful cleanup
+* *Polling overhead* — Must check the flag repeatedly
+
+=== The Thread Interruption Problem
+
+Some systems support forceful thread interruption. This is dangerous because it can leave resources in inconsistent states—files half-written, locks held, transactions uncommitted.
+
+=== The Goal: Cooperative Cancellation
 
-Awaitables that support cancellation implement the `stoppable_awaitable`
-concept. Their `await_suspend` receives both a dispatcher and a stop token:
+The solution is *cooperative cancellation*: ask nicely, let the work clean up. The cancellation requestor signals intent; the worker decides when and how to respond.
+
+== Part 2: C++20 Stop Tokens—A General-Purpose Signaling Mechanism
+
+C++20 introduces `std::stop_token`, `std::stop_source`, and `std::stop_callback`. While named for "stopping," these implement a general-purpose *Observer pattern*—a thread-safe one-to-many notification system.
+
+=== The Three Components
+
+`std::stop_source`::
+The *Subject/Publisher*. Owns the shared state and can trigger notifications. Create one source, then distribute tokens to observers.
+
+`std::stop_token`::
+The *Subscriber View*. A read-only, copyable, cheap-to-pass-around handle. Multiple tokens can share the same underlying state.
+
+`std::stop_callback<F>`::
+The *Observer Registration*. An RAII object that registers a callback to run when signaled. Destruction automatically unregisters.
+
+=== How They Work Together
 
 [source,cpp]
 ----
-template<typename Dispatcher>
-auto await_suspend(
-    std::coroutine_handle<> h,
-    Dispatcher const& d,
-    std::stop_token token)
-{
-    if (token.stop_requested())
-    {
-        // Already cancelled, resume immediately
-        return d(h);
-    }
+#include <stop_token>
+#include <iostream>
 
-    // Start async operation with cancellation support
-    start_async([h, &d, token] {
-        if (token.stop_requested())
-        {
-            // Handle cancellation
-        }
-        d(h);
-    });
-    return std::noop_coroutine();
+void example()
+{
+    std::stop_source source;
+    
+    // Create tokens (distribute notification capability)
+    std::stop_token token1 = source.get_token();
+    std::stop_token token2 = source.get_token();  // Same underlying state
+    
+    // Register callbacks (observers)
+    std::stop_callback cb1(token1, []{ std::cout << "Observer 1 notified\n"; });
+    std::stop_callback cb2(token2, []{ std::cout << "Observer 2 notified\n"; });
+    
+    std::cout << "Before signal\n";
+    source.request_stop();  // Triggers all callbacks
+    std::cout << "After signal\n";
 }
 ----
 
-== Implementing a Stoppable Timer
+*Output:*
 
-Here is a complete example of a stoppable timer:
+----
+Before signal
+Observer 1 notified
+Observer 2 notified
+After signal
+----
+
+=== Immediate Invocation
+
+If a callback is registered after `request_stop()` was already called, the callback runs *immediately* in the constructor:
 
 [source,cpp]
 ----
-struct stoppable_timer
-{
-    std::chrono::milliseconds duration_;
-    bool cancelled_ = false;
+std::stop_source source;
+source.request_stop();  // Already signaled
 
-    bool await_ready() const noexcept
-    {
-        return duration_.count() <= 0;
-    }
+// Callback runs in constructor, not later
+std::stop_callback cb(source.get_token(), []{ 
+    std::cout << "Runs immediately!\n"; 
+});
+----
 
-    // Affine path (no cancellation)
-    template<typename Dispatcher>
-    auto await_suspend(coro h, Dispatcher const& d)
-    {
-        start_timer(duration_, [h, &d] { d(h); });
-        return std::noop_coroutine();
-    }
+This ensures observers never miss the signal, regardless of registration timing.
 
-    // Stoppable path (with cancellation)
-    template<typename Dispatcher>
-    auto await_suspend(
-        coro h,
-        Dispatcher const& d,
-        std::stop_token token)
-    {
-        if (token.stop_requested())
-        {
-            cancelled_ = true;
-            return d(h);  // Resume immediately
-        }
+=== Type-Erased Polymorphic Observers
 
-        auto timer_handle = start_timer(duration_, [h, &d] { d(h); });
+Each `stop_callback<F>` stores a different callable type `F`. Despite this, all callbacks for a given source can be invoked uniformly. This is equivalent to having `vector<function<void()>>` but with:
 
-        // Cancel timer if stop requested
-        std::stop_callback cb(token, [timer_handle] {
-            cancel_timer(timer_handle);
-        });
+* No heap allocation per callback
+* No virtual function overhead
+* RAII lifetime management
 
-        return std::noop_coroutine();
-    }
+=== Thread Safety
 
-    void await_resume()
-    {
-        if (cancelled_)
-            throw std::runtime_error("operation cancelled");
-    }
-};
+Registration and invocation are thread-safe. You can register callbacks, request stop, and invoke callbacks from any thread without additional synchronization.
+
+== Part 3: The One-Shot Nature
+
+[WARNING]
+====
+*Critical limitation*: `stop_token` is a *one-shot* mechanism.
+
+* Can only transition from "not signaled" to "signaled" once
+* No reset mechanism—once `stop_requested()` returns true, it stays true forever
+* `request_stop()` returns `true` only on the first successful call
+* *You cannot "un-cancel" a stop_source*
+====
+
+=== Why This Matters
+
+If you design a system that needs to cancel and restart operations, you cannot reuse the same `stop_source`. Each cycle requires a fresh source and fresh tokens.
+
+=== The Reset Workaround
+
+To "reset," create an entirely new `stop_source`:
+
+[source,cpp]
+----
+std::stop_source source;
+auto token = source.get_token();
+
+// ... distribute token to workers ...
+
+source.request_stop();  // Triggered, now permanently signaled
+
+// To "reset": create new source
+source = std::stop_source{};  // New shared state
+// Old tokens are now orphaned (stop_possible() returns false)
+
+// Must redistribute new tokens to ALL holders of the old token
+auto new_token = source.get_token();
+----
+
+This is manual and error-prone. Any code still holding the old token will not receive new signals.
+
+=== Design Implication
+
+If you need repeatable signals, `stop_token` is the wrong tool. Consider:
+
+* Condition variables for repeatable wake-ups
+* Atomic flags with explicit reset protocol
+* Custom event types
+
+== Part 4: Beyond Cancellation
+
+The "stop" naming obscures the mechanism's generality. `stop_token` implements *one-shot broadcast notification*, useful for:
+
+* *Starting things* — Signal "ready" to trigger initialization
+* *Configuration loaded* — Notify components when config is available
+* *Resource availability* — Signal when database connected or cache warmed
+* *Any one-shot broadcast scenario*
+
+== Part 5: Stop Tokens in Coroutines
+
+Coroutines have a propagation problem: how does a nested coroutine know to stop? If you pass a stop token explicitly to every function, your APIs become cluttered.
+
+=== Capy's Answer: Automatic Propagation
+
+Capy propagates stop tokens downward through `co_await`. When you await a task, the IoAwaitable protocol passes the current stop token to the child:
+
+[source,cpp]
 ----
+task<> parent()
+{
+    // Our stop token is automatically passed to child
+    co_await child();
+}
 
-Key points:
+task<> child()
+{
+    // Receives parent's stop token via IoAwaitable protocol
+    auto token = co_await get_stop_token();  // Access current token
+}
+----
 
-* Provide _both_ `await_suspend` overloads (with and without token)
-* Check `stop_requested()` before starting work
-* Register a `stop_callback` to cancel the underlying operation
-* Signal cancellation in `await_resume` (typically via exception)
+No manual threading—the protocol handles it.
 
-== Checking Cancellation Status
+=== Accessing the Stop Token
 
-Within a coroutine, use `co_await this_coro::stop_token` to retrieve the current
-stop token and check if cancellation was requested:
+Inside a task, use `get_stop_token()` to access the current stop token:
 
 [source,cpp]
 ----
-task<void> long_running_work()
+task<> cancellable_work()
 {
-    auto token = co_await this_coro::stop_token;
+    auto token = co_await get_stop_token();
+    
+    while (!token.stop_requested())
+    {
+        co_await do_chunk_of_work();
+    }
+}
+----
 
-    for (int i = 0; i < 1000; ++i)
+== Part 6: Responding to Cancellation
+
+=== Checking the Token
+
+[source,cpp]
+----
+task<> process_items(std::vector<Item> const& items)
+{
+    auto token = co_await get_stop_token();
+    
+    for (auto const& item : items)
     {
         if (token.stop_requested())
-            co_return;  // Exit gracefully
-
-        co_await process_chunk(i);
+            co_return;  // Exit early
+        
+        co_await process(item);
     }
 }
 ----
 
-The `this_coro::stop_token` is a tag that the promise intercepts via
-`await_transform`. This operation never suspends—`await_ready()` always
-returns `true`.
+=== Cleanup with RAII
 
-If no stop token was propagated to this coroutine, the returned token has
-`stop_possible() == false`.
+RAII ensures resources are released on early exit:
+
+[source,cpp]
+----
+task<> with_resource()
+{
+    auto resource = acquire_resource();  // RAII wrapper
+    auto token = co_await get_stop_token();
+    
+    while (!token.stop_requested())
+    {
+        co_await use_resource(resource);
+    }
+    // resource destructor runs regardless of how we exit
+}
+----
 
-== Supporting this_coro::stop_token in Custom Coroutines
+=== The operation_aborted Convention
 
-If you define your own coroutine type and want to support `this_coro::stop_token`,
-inherit your promise type from `io_awaitable_support`:
+When cancellation causes an operation to fail, the conventional error code is `error::operation_aborted`:
 
 [source,cpp]
 ----
-struct my_task
+task<std::string> fetch_with_cancel()
 {
-    struct promise_type : io_awaitable_support<promise_type>
+    auto token = co_await get_stop_token();
+    
+    if (token.stop_requested())
     {
-        my_task get_return_object();
-        std::suspend_always initial_suspend() noexcept;
-        std::suspend_always final_suspend() noexcept;
-        void return_void();
-        void unhandled_exception();
-    };
+        throw std::system_error(
+            make_error_code(std::errc::operation_canceled));
+    }
+    
+    co_return co_await do_fetch();
+}
+----
 
-    std::coroutine_handle<promise_type> h_;
+== Part 7: OS Integration
 
-    // ... awaitable interface ...
-};
-----
+Capy's I/O operations (provided by Corosio) respect stop tokens at the OS level:
 
-The mixin provides:
+* *IOCP* (Windows) — Pending operations can be cancelled via `CancelIoEx`
+* *io_uring* (Linux) — Operations can be cancelled via `IORING_OP_ASYNC_CANCEL`
 
-* Storage for the stop token and executor (private)
-* `set_stop_token(token)` — call from your `await_suspend`
-* `stop_token()` — getter for internal use
-* `set_executor(ex)` — call from your `await_suspend`
-* `executor()` — getter for internal use
-* `await_transform` — intercepts `this_coro::stop_token` and `this_coro::executor`
+When you request stop, pending I/O operations are cancelled at the OS level, providing immediate response rather than waiting for the operation to complete naturally.
 
-To make your coroutine type satisfy `IoAwaitable` (receive tokens and executor
-when awaited), implement the `await_suspend` overload:
+== Part 8: Patterns
+
+=== Timeout Pattern
+
+Combine a timer with stop token to implement timeouts:
 
 [source,cpp]
 ----
-template<class Ex>
-coro await_suspend(coro cont, Ex const& ex, std::stop_token token)
+task<> with_timeout(task<> operation, std::chrono::seconds timeout)
 {
-    h_.promise().set_stop_token(token);
-    h_.promise().set_executor(ex);
-    // ... rest of suspend logic ...
+    std::stop_source source;
+    
+    // Timer that requests stop after timeout
+    auto timer = co_await start_timer(timeout, [&source] {
+        source.request_stop();
+    });
+    
+    // Run operation with our stop token
+    co_await run_with_token(source.get_token(), std::move(operation));
 }
 ----
 
-== When NOT to Use Cancellation
-
-Use cancellation when:
+=== User Cancellation
 
-* Operations may take a long time
-* Users need to abort operations
-* Timeouts are required
+Connect UI cancellation to stop tokens:
 
-Do NOT use cancellation when:
+[source,cpp]
+----
+class download_manager
+{
+    std::stop_source stop_source_;
+    
+public:
+    void start_download(std::string url)
+    {
+        run_async(executor_)(download(url, stop_source_.get_token()));
+    }
+    
+    void cancel()
+    {
+        stop_source_.request_stop();
+    }
+};
+----
 
-* Operations are very short — the overhead is not worth it
-* Operations cannot be interrupted meaningfully
-* You need guaranteed completion
+=== Graceful Shutdown
 
-== Summary
+Cancel all pending work during shutdown:
 
-[cols="1,3"]
-|===
-| Concept | Description
+[source,cpp]
+----
+class server
+{
+    std::stop_source shutdown_source_;
+    
+public:
+    void shutdown()
+    {
+        shutdown_source_.request_stop();
+        // All pending operations receive stop request
+    }
+    
+    task<> handle_connection(connection conn)
+    {
+        auto token = shutdown_source_.get_token();
+        
+        while (!token.stop_requested())
+        {
+            co_await process_request(conn);
+        }
+        
+        // Graceful cleanup
+        co_await send_goodbye(conn);
+    }
+};
+----
 
-| Cooperative
-| Operations check the token and decide how to respond
+=== when_any Cancellation
 
-| Automatic propagation
-| Tokens flow through `co_await` chains
+`when_any` uses stop tokens internally to cancel "losing" tasks when the first task completes. This is covered in xref:composition.adoc[Concurrent Composition].
 
-| `this_coro::stop_token`
-| Retrieve the current stop token inside a coroutine
+== Reference
 
-| `io_awaitable_support`
-| CRTP mixin for custom coroutines to support `this_coro::stop_token` and `this_coro::executor`
+The stop token mechanism is part of the C++ standard library:
 
-| `IoAwaitable`
-| Concept for awaitables that support cancellation and executor affinity
+[source,cpp]
+----
+#include <stop_token>
+----
 
-| `stop_callback`
-| Register cleanup when cancellation is requested
-|===
+Key types:
 
-== Next Steps
+* `std::stop_source` — Creates and manages stop state
+* `std::stop_token` — Observes stop state
+* `std::stop_callback<F>` — Registers callbacks for stop notification
 
-* xref:when-all.adoc[Concurrent Composition] — Cancellation with `when_all`
-* xref:../execution/executors.adoc[Executors] — Understand the execution model
+You have now learned how stop tokens provide cooperative cancellation for coroutines. In the next section, you will learn about concurrent composition with `when_all` and `when_any`.
diff --git a/doc/modules/ROOT/pages/coroutines/composition.adoc b/doc/modules/ROOT/pages/coroutines/composition.adoc
new file mode 100644
index 00000000..563fcc72
--- /dev/null
+++ b/doc/modules/ROOT/pages/coroutines/composition.adoc
@@ -0,0 +1,255 @@
+= Concurrent Composition
+
+This section explains how to run multiple tasks concurrently using `when_all` and `when_any`.
+
+== Prerequisites
+
+* Completed xref:cancellation.adoc[Stop Tokens and Cancellation]
+* Understanding of stop token propagation
+
+== Overview
+
+Sequential execution—one task after another—is the default when using `co_await`:
+
+[source,cpp]
+----
+task<> sequential()
+{
+    co_await task_a();  // Wait for A
+    co_await task_b();  // Then wait for B
+    co_await task_c();  // Then wait for C
+}
+----
+
+For independent operations, concurrent execution is more efficient:
+
+[source,cpp]
+----
+task<> concurrent()
+{
+    // Run A, B, C simultaneously
+    co_await when_all(task_a(), task_b(), task_c());
+}
+----
+
+== when_all: Wait for All Tasks
+
+`when_all` launches multiple tasks concurrently and waits for all of them to complete:
+
+[source,cpp]
+----
+#include <boost/capy/when_all.hpp>
+
+task<int> fetch_a() { co_return 1; }
+task<int> fetch_b() { co_return 2; }
+task<std::string> fetch_c() { co_return "hello"; }
+
+task<> example()
+{
+    auto [a, b, c] = co_await when_all(fetch_a(), fetch_b(), fetch_c());
+    
+    // a == 1
+    // b == 2
+    // c == "hello"
+}
+----
+
+=== Result Tuple
+
+`when_all` returns a tuple of results in the same order as the input tasks. Use structured bindings to unpack them.
+
+=== Void Filtering
+
+Tasks returning `void` do not contribute to the result tuple:
+
+[source,cpp]
+----
+task<> void_task() { co_return; }
+task<int> int_task() { co_return 42; }
+
+task<> example()
+{
+    auto [value] = co_await when_all(void_task(), int_task(), void_task());
+    // value == 42  (only the int_task contributes)
+}
+----
+
+If all tasks return `void`, `when_all` returns `void`:
+
+[source,cpp]
+----
+task<> example()
+{
+    co_await when_all(void_task_a(), void_task_b());  // Returns void
+}
+----
+
+=== Error Handling
+
+If any task throws an exception:
+
+1. The exception is captured
+2. Stop is requested for sibling tasks
+3. All tasks are allowed to complete (or respond to stop)
+4. The *first* exception is rethrown; later exceptions are discarded
+
+[source,cpp]
+----
+task<int> might_fail(bool fail)
+{
+    if (fail)
+        throw std::runtime_error("failed");
+    co_return 42;
+}
+
+task<> example()
+{
+    try
+    {
+        co_await when_all(might_fail(true), might_fail(false));
+    }
+    catch (std::runtime_error const& e)
+    {
+        // Catches the exception from the failing task
+    }
+}
+----
+
+=== Stop Propagation
+
+When one task fails, `when_all` requests stop for its siblings. Well-behaved tasks should check their stop token and exit promptly:
+
+[source,cpp]
+----
+task<> long_running()
+{
+    auto token = co_await get_stop_token();
+    
+    for (int i = 0; i < 1000; ++i)
+    {
+        if (token.stop_requested())
+            co_return;  // Exit early when sibling fails
+        
+        co_await do_iteration();
+    }
+}
+----
+
+== when_any: First-to-Finish Wins
+
+`when_any` is not yet implemented in Capy, but its design is planned:
+
+* Launch multiple tasks concurrently
+* Return when the *first* task completes
+* Cancel remaining tasks via stop token
+* Return the winning task's result
+
+The pattern would look like:
+
+[source,cpp]
+----
+// Planned API (not yet available)
+task<std::variant<int, std::string>> example()
+{
+    co_return co_await when_any(
+        fetch_int(),     // task<int>
+        fetch_string()   // task<std::string>
+    );
+}
+----
+
+== Practical Patterns
+
+=== Parallel Fetch
+
+Fetch multiple resources simultaneously:
+
+[source,cpp]
+----
+task<page_data> fetch_page_data(std::string url)
+{
+    auto [header, body, sidebar] = co_await when_all(
+        fetch_header(url),
+        fetch_body(url),
+        fetch_sidebar(url)
+    );
+    
+    co_return page_data{
+        std::move(header),
+        std::move(body),
+        std::move(sidebar)
+    };
+}
+----
+
+=== Fan-Out/Fan-In
+
+Process items in parallel, then combine results:
+
+[source,cpp]
+----
+task<int> process_item(item const& i);
+
+task<int> process_all(std::vector<item> const& items)
+{
+    std::vector<task<int>> tasks;
+    for (auto const& item : items)
+        tasks.push_back(process_item(item));
+    
+    // This requires a range-based when_all (not yet available)
+    // For now, use fixed-arity when_all
+    
+    int total = 0;
+    // ... accumulate results
+    co_return total;
+}
+----
+
+=== Timeout with Fallback
+
+Use when_any (when available) to implement timeout with fallback:
+
+[source,cpp]
+----
+// Planned pattern
+task<result> fetch_with_fallback()
+{
+    co_return co_await when_any(
+        fetch_from_primary(),
+        delay_then(std::chrono::seconds(5), fetch_from_backup())
+    );
+}
+----
+
+== Implementation Notes
+
+=== Task Storage
+
+`when_all` stores all tasks in its coroutine frame. Tasks are moved from the arguments, so the original task objects become empty after the call.
+
+=== Completion Tracking
+
+A shared atomic counter tracks how many tasks remain. Each task completion decrements the counter. When it reaches zero, the parent coroutine is resumed.
+
+=== Runner Coroutines
+
+Each child task is wrapped in a "runner" coroutine that:
+
+1. Receives context (executor, stop token) from `when_all`
+2. Awaits the child task
+3. Stores the result in shared state
+4. Signals completion
+
+This design ensures proper context propagation to all children.
+
+== Reference
+
+[cols="1,3"]
+|===
+| Header | Description
+
+| `<boost/capy/when_all.hpp>`
+| Concurrent composition with when_all
+|===
+
+You have now learned how to compose tasks concurrently with `when_all`. In the next section, you will learn about frame allocators for customizing coroutine memory allocation.
diff --git a/doc/modules/ROOT/pages/coroutines/executors.adoc b/doc/modules/ROOT/pages/coroutines/executors.adoc
new file mode 100644
index 00000000..7e29766e
--- /dev/null
+++ b/doc/modules/ROOT/pages/coroutines/executors.adoc
@@ -0,0 +1,233 @@
+= Executors and Execution Contexts
+
+This section explains executors and execution contexts—the mechanisms that control where and how coroutines execute.
+
+== Prerequisites
+
+* Completed xref:launching.adoc[Launching Coroutines]
+* Understanding of `run_async` and `run`
+
+== The Executor Concept
+
+An *executor* is an object that can schedule work for execution. In Capy, executors must provide two methods:
+
+[source,cpp]
+----
+concept Executor = requires(E ex, coro h) {
+    { ex.dispatch(h) } -> std::same_as<coro>;
+    { ex.post(h) } -> std::same_as<void>;
+    { ex.context() } -> std::convertible_to<execution_context&>;
+};
+----
+
+=== dispatch() vs post()
+
+Both methods schedule a coroutine for execution, but with different semantics:
+
+`dispatch(h)`::
+May execute `h` inline if the current thread is already associated with the executor. Returns a coroutine handle—either `h` if execution was deferred, or `std::noop_coroutine()` if `h` was executed immediately. This enables symmetric transfer optimization.
+
+`post(h)`::
+Always queues `h` for later execution. Never executes inline. Returns void. Use when you need guaranteed asynchrony.
+
+=== context()
+
+Returns a reference to the execution context that owns this executor. The context provides resources like frame allocators.
+
+== executor_ref: Type-Erased Executor
+
+`executor_ref` wraps any executor in a type-erased container, allowing code to work with executors without knowing their concrete type:
+
+[source,cpp]
+----
+void schedule_work(executor_ref ex, coro h)
+{
+    ex.post(h);  // Works with any executor
+}
+
+int main()
+{
+    thread_pool pool;
+    executor_ref ex = pool.get_executor();  // Type erasure
+    
+    schedule_work(ex, some_coroutine);
+}
+----
+
+`executor_ref` stores a reference to the underlying executor—the original executor must outlive the `executor_ref`.
+
+== thread_pool: Multi-Threaded Execution
+
+`thread_pool` manages a pool of worker threads that execute coroutines concurrently:
+
+[source,cpp]
+----
+#include <boost/capy/ex/thread_pool.hpp>
+
+int main()
+{
+    // Create pool with 4 threads
+    thread_pool pool(4);
+    
+    // Get an executor for this pool
+    auto ex = pool.get_executor();
+    
+    // Launch work on the pool
+    run_async(ex)(my_task());
+    
+    // pool destructor waits for all work to complete
+}
+----
+
+=== Constructor Parameters
+
+[source,cpp]
+----
+thread_pool(
+    std::size_t num_threads = 0,
+    std::string_view thread_name_prefix = "capy-pool-"
+);
+----
+
+* `num_threads` — Number of worker threads. If 0, uses hardware concurrency.
+* `thread_name_prefix` — Prefix for thread names (useful for debugging).
+
+=== Thread Safety
+
+Work posted to a `thread_pool` may execute on any of its worker threads. If your coroutines access shared data, you must use appropriate synchronization.
+
+== execution_context: Base Class
+
+`execution_context` is the base class for execution contexts. It provides:
+
+* Frame allocator access via `get_frame_allocator()`
+* Service infrastructure for extensibility
+
+Custom execution contexts inherit from `execution_context`:
+
+[source,cpp]
+----
+class my_context : public execution_context
+{
+public:
+    // ... custom implementation
+    
+    my_executor get_executor();
+};
+----
+
+== strand: Serialization Without Mutexes
+
+A `strand` ensures that handlers are executed in order, with no two handlers executing concurrently. This eliminates the need for mutexes when all access to shared data goes through the strand.
+
+[source,cpp]
+----
+#include <boost/capy/ex/strand.hpp>
+
+class shared_resource
+{
+    strand<thread_pool::executor_type> strand_;
+    int counter_ = 0;
+    
+public:
+    explicit shared_resource(thread_pool& pool)
+        : strand_(pool.get_executor())
+    {
+    }
+    
+    task<int> increment()
+    {
+        // All increments are serialized through the strand
+        co_return co_await run(strand_)(do_increment());
+    }
+    
+private:
+    task<int> do_increment()
+    {
+        // No mutex needed—strand ensures exclusive access
+        ++counter_;
+        co_return counter_;
+    }
+};
+----
+
+=== How Strands Work
+
+The strand maintains a queue of pending work. When work is dispatched:
+
+1. If no other work is executing on the strand, the new work runs immediately
+2. If other work is executing, the new work is queued
+3. When the current work completes, the next queued item runs
+
+This provides logical single-threading without blocking physical threads.
+
+=== When to Use Strands
+
+* *Thread-affine resources* — When code must not be called from multiple threads simultaneously
+* *Ordered operations* — When operations must complete in a specific order
+* *Avoiding mutexes* — When mutex overhead is unacceptable
+
+== Single-Threaded vs Multi-Threaded Patterns
+
+=== Single-Threaded
+
+For single-threaded applications, use a context with one thread:
+
+[source,cpp]
+----
+thread_pool single_thread(1);
+auto ex = single_thread.get_executor();
+// All work runs on the single thread
+----
+
+=== Multi-Threaded with Shared Data
+
+For multi-threaded applications with shared data, use strands:
+
+[source,cpp]
+----
+thread_pool pool(4);
+strand<thread_pool::executor_type> data_strand(pool.get_executor());
+
+// Use data_strand for all access to shared data
+// Use pool.get_executor() for independent work
+----
+
+=== Multi-Threaded with Independent Work
+
+For embarrassingly parallel work with no shared state:
+
+[source,cpp]
+----
+thread_pool pool(4);
+auto ex = pool.get_executor();
+
+// Launch independent tasks directly on the pool
+std::vector<task<int>> tasks;
+for (int i = 0; i < 100; ++i)
+    run_async(ex)(independent_task(i));
+----
+
+== Reference
+
+[cols="1,3"]
+|===
+| Header | Description
+
+| `<boost/capy/concept/executor.hpp>`
+| The Executor concept definition
+
+| `<boost/capy/ex/executor_ref.hpp>`
+| Type-erased executor wrapper
+
+| `<boost/capy/ex/thread_pool.hpp>`
+| Multi-threaded execution context
+
+| `<boost/capy/ex/execution_context.hpp>`
+| Base class for execution contexts
+
+| `<boost/capy/ex/strand.hpp>`
+| Serialization primitive
+|===
+
+You have now learned about executors, execution contexts, thread pools, and strands. In the next section, you will learn about the IoAwaitable protocol that enables context propagation.
diff --git a/doc/modules/ROOT/pages/coroutines/io-awaitable.adoc b/doc/modules/ROOT/pages/coroutines/io-awaitable.adoc
new file mode 100644
index 00000000..3f91d3bf
--- /dev/null
+++ b/doc/modules/ROOT/pages/coroutines/io-awaitable.adoc
@@ -0,0 +1,187 @@
+= The IoAwaitable Protocol
+
+This section explains the IoAwaitable protocol—Capy's mechanism for propagating execution context through coroutine chains.
+
+== Prerequisites
+
+* Completed xref:executors.adoc[Executors and Execution Contexts]
+* Understanding of standard awaiter protocol (`await_ready`, `await_suspend`, `await_resume`)
+
+== The Problem: Context Propagation
+
+Standard C++20 coroutines define awaiters with this `await_suspend` signature:
+
+[source,cpp]
+----
+void await_suspend(std::coroutine_handle<> h);
+// or
+bool await_suspend(std::coroutine_handle<> h);
+// or
+std::coroutine_handle<> await_suspend(std::coroutine_handle<> h);
+----
+
+The awaiter receives only a handle to the suspended coroutine. But real I/O code needs more:
+
+* *Executor* — Where should completions be dispatched?
+* *Stop token* — Should this operation support cancellation?
+* *Allocator* — Where should memory be allocated?
+
+How does an awaitable get this information?
+
+=== Backward Query Approach
+
+One approach: the awaitable queries the calling coroutine's promise for context. This requires the awaitable to know the promise type, creating tight coupling.
+
+=== Forward Propagation Approach
+
+Capy uses *forward propagation*: the caller passes context to the awaitable through an extended `await_suspend` signature.
+
+== The Three-Argument await_suspend
+
+The IoAwaitable protocol extends `await_suspend` to receive context:
+
+[source,cpp]
+----
+coro await_suspend(coro h, executor_ref ex, std::stop_token token);
+----
+
+This signature receives:
+
+* `h` — The coroutine handle (as in standard awaiters)
+* `ex` — The caller's executor for dispatching completions
+* `token` — A stop token for cooperative cancellation
+
+The return type enables symmetric transfer.
+
+== IoAwaitable Concept
+
+An awaitable satisfies `IoAwaitable` if:
+
+[source,cpp]
+----
+template<typename T>
+concept IoAwaitable = requires(T& t, coro h, executor_ref ex, std::stop_token st) {
+    { t.await_ready() } -> std::convertible_to<bool>;
+    { t.await_suspend(h, ex, st) } -> std::same_as<coro>;
+    t.await_resume();
+};
+----
+
+The key difference from standard awaitables is the three-argument `await_suspend`.
+
+== IoAwaitableTask Concept
+
+A task type satisfies `IoAwaitableTask` if its promise provides:
+
+* `set_executor(executor_ref)` — Store the propagated executor
+* `executor()` — Retrieve the stored executor
+* `set_stop_token(std::stop_token)` — Store the propagated stop token
+* `stop_token()` — Retrieve the stored stop token
+
+Capy's `task<T>` satisfies this concept.
+
+== IoLaunchableTask Concept
+
+For tasks that can be launched (not just awaited), the `IoLaunchableTask` concept adds:
+
+* `handle()` — Access the coroutine handle
+* `release()` — Transfer ownership of the handle
+* `exception()` — Check for captured exceptions
+* `result()` — Access the result value
+
+These methods enable `run_async` to manage task lifecycle.
+
+== How Context Flows
+
+When you write `co_await child_task()` inside a `task<T>`:
+
+1. The parent task's `await_transform` intercepts the awaitable
+2. It wraps the child in a transform awaiter
+3. The transform awaiter's `await_suspend` passes context:
+
+[source,cpp]
+----
+template<class Awaitable>
+auto await_suspend(std::coroutine_handle<Promise> h)
+{
+    // Forward caller's context to child
+    return awaitable_.await_suspend(h, promise_.executor(), promise_.stop_token());
+}
+----
+
+The child receives the parent's executor and stop token automatically.
+
+== Why Forward Propagation?
+
+Forward propagation offers several advantages:
+
+* *Decoupling* — Awaitables don't need to know caller's promise type
+* *Composability* — Any IoAwaitable works with any IoAwaitableTask
+* *Explicit flow* — Context flows downward through the call chain, not queried upward
+
+This design enables Capy's type-erased wrappers (`any_stream`, etc.) to work without knowing the concrete executor type.
+
+== Implementing Custom IoAwaitables
+
+To create a custom IoAwaitable:
+
+[source,cpp]
+----
+struct my_awaitable
+{
+    bool await_ready() const noexcept
+    {
+        return false;  // Or true if result is immediately available
+    }
+    
+    coro await_suspend(coro h, executor_ref ex, std::stop_token token)
+    {
+        // Store continuation and context
+        continuation_ = h;
+        executor_ = ex;
+        stop_token_ = token;
+        
+        // Start async operation...
+        start_operation();
+        
+        // Return noop to suspend
+        return std::noop_coroutine();
+    }
+    
+    result_type await_resume()
+    {
+        return result_;
+    }
+    
+private:
+    void on_completion()
+    {
+        // Resume on caller's executor
+        executor_.dispatch(continuation_).resume();
+    }
+};
+----
+
+The key points:
+
+1. Store the continuation and executor in `await_suspend`
+2. Use the executor to dispatch completion
+3. Respect the stop token for cancellation
+
+== Reference
+
+[cols="1,3"]
+|===
+| Header | Description
+
+| `<boost/capy/concept/io_awaitable.hpp>`
+| The IoAwaitable concept definition
+
+| `<boost/capy/concept/io_awaitable_task.hpp>`
+| The IoAwaitableTask concept for task types
+
+| `<boost/capy/concept/io_launchable_task.hpp>`
+| The IoLaunchableTask concept for launchable tasks
+|===
+
+You have now learned how the IoAwaitable protocol enables context propagation through coroutine chains. In the next section, you will learn about stop tokens and cooperative cancellation.
diff --git a/doc/modules/ROOT/pages/coroutines/launching.adoc b/doc/modules/ROOT/pages/coroutines/launching.adoc
index c6523e1a..9b8a80c4 100644
--- a/doc/modules/ROOT/pages/coroutines/launching.adoc
+++ b/doc/modules/ROOT/pages/coroutines/launching.adoc
@@ -1,178 +1,167 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
+= Launching Coroutines
 
-= Launching Tasks
+This section explains how to launch coroutines for execution. You will learn about `run_async` for entry from non-coroutine code and `run` for executor hopping within coroutine code.
 
-This page explains how to start lazy tasks for execution using `run_async`.
+== Prerequisites
 
-NOTE: Code snippets assume `using namespace boost::capy;` is in effect.
+* Completed xref:tasks.adoc[The task Type]
+* Understanding of lazy task execution
 
-== Why Tasks Need a Driver
+== The Execution Model
 
-Tasks are lazy. They remain suspended until something starts them. Within a
-coroutine, `co_await` serves this purpose. But at the program's entry point,
-you need a way to kick off the first coroutine.
+Capy tasks are lazy—they do not execute until something drives them. Two mechanisms exist:
 
-The `run_async` function provides this capability. It:
+* *Awaiting* — One coroutine awaits another (`co_await task`)
+* *Launching* — Non-coroutine code initiates execution (`run_async`)
 
-1. Binds a task to a dispatcher (typically an executor)
-2. Starts the task's execution
-3. Optionally delivers the result to a completion handler
+When a task is awaited, the awaiting coroutine provides context: an executor for dispatching completion and a stop token for cancellation. But what about the first task in a chain? That task needs explicit launching.
 
-== Basic Usage
+== run_async: Entry from Non-Coroutine Code
+
+`run_async` is the bridge between regular code and coroutine code. It takes an executor, creates the necessary context, and starts the task executing.
 
 [source,cpp]
 ----
-#include <boost/capy/ex/run_async.hpp>
+#include <boost/capy.hpp>
+using namespace boost::capy;
 
-void start(executor ex)
+task<int> compute()
 {
-    run_async(ex)(compute());
+    co_return 42;
 }
-----
 
-The syntax `run_async(ex)(task)` creates a runner bound to the executor, then
-immediately launches the task. The task begins executing when the executor
-schedules it.
+int main()
+{
+    thread_pool pool;
+    run_async(pool.get_executor())(compute());
+    // Task is now running on the thread pool
+    
+    // pool destructor waits for work to complete
+    return 0;
+}
+----
 
-== Fire and Forget
+=== Two-Call Syntax
 
-The simplest pattern discards the result:
+Notice the unusual syntax: `run_async(executor)(task)`. This is intentional and relates to C++17 evaluation order.
 
-[source,cpp]
-----
-run_async(ex)(compute());
-----
+C++17 guarantees that in the expression `f(a)(b)`:
 
-If the task throws an exception, it propagates to the executor's error handling
-(typically rethrown from `run()`). This pattern is appropriate for top-level
-tasks where errors should terminate the program.
+1. `f(a)` is evaluated first, producing a callable
+2. `b` is evaluated second
+3. The callable is invoked with `b`
 
-== Handling Results
+This ordering matters because the task's coroutine frame is allocated during step 2, and `run_async` sets up thread-local allocator state in step 1. The task inherits that allocator.
 
-To receive the task's result, provide a completion handler:
+[WARNING]
+====
+Do not store the result of `run_async(executor)` and call it later:
 
 [source,cpp]
 ----
-run_async(ex)(compute(), [](int result) {
-    std::cout << "Got: " << result << "\n";
-});
+auto wrapper = run_async(pool.get_executor());  // Don't do this
+wrapper(compute());  // TLS state no longer valid
 ----
 
-The handler is called when the task completes successfully. For `task<void>`,
-the handler takes no arguments:
+Always use the two-call pattern in a single expression.
+====
 
-[source,cpp]
-----
-run_async(ex)(do_work(), []() {
-    std::cout << "Work complete\n";
-});
-----
-
-== Handling Errors
+=== Handler Overloads
 
-To handle both success and failure, provide a handler that also accepts
-`std::exception_ptr`:
+`run_async` accepts optional handlers for results and exceptions:
 
 [source,cpp]
 ----
-run_async(ex)(compute(), overloaded{
-    [](int result) {
-        std::cout << "Success: " << result << "\n";
-    },
+// Result handler only (exceptions rethrown)
+run_async(ex, [](int result) {
+    std::cout << "Got: " << result << "\n";
+})(compute());
+
+// Separate handlers for result and exception
+run_async(ex,
+    [](int result) { std::cout << "Result: " << result << "\n"; },
     [](std::exception_ptr ep) {
-        try {
-            if (ep) std::rethrow_exception(ep);
-        } catch (std::exception const& e) {
-            std::cerr << "Error: " << e.what() << "\n";
+        try { std::rethrow_exception(ep); }
+        catch (std::exception const& e) {
+            std::cout << "Error: " << e.what() << "\n";
         }
     }
-});
+)(compute());
 ----
 
-Alternatively, use separate handlers for success and error:
-
-[source,cpp]
-----
-run_async(ex)(compute(),
-    [](int result) { std::cout << result << "\n"; },
-    [](std::exception_ptr ep) { /* handle error */ }
-);
-----
+When no handlers are provided, results are discarded and exceptions are rethrown (causing `std::terminate` if uncaught).
 
-== The Single-Expression Idiom
+=== Stop Token Support
 
-The `run_async` return value enforces a specific usage pattern:
+Pass a stop token to enable cooperative cancellation:
 
 [source,cpp]
 ----
-// CORRECT: Single expression
-run_async(ex)(make_task());
+std::stop_source source;
+run_async(ex, source.get_token())(cancellable_task());
 
-// INCORRECT: Split across statements
-auto runner = run_async(ex);  // Sets thread-local state
-// ... other code may interfere ...
-runner(make_task());          // Won't compile (deleted move)
+// Later, to request cancellation:
+source.request_stop();
 ----
 
-This design ensures the frame allocator is active when your task is created,
-enabling frame recycling optimization.
+The stop token is propagated to the task and all tasks it awaits.
 
-== Stop Token Support
+== run: Executor Hopping Within Coroutines
 
-Pass a stop token for cooperative cancellation:
+Inside a coroutine, use `run` to execute a child task on a different executor:
 
 [source,cpp]
 ----
-std::stop_source source;
-run_async(ex, source.get_token())(cancellable_task());
-
-// Later: request cancellation
-source.request_stop();
+task<int> compute_on_pool(thread_pool& pool)
+{
+    // This task runs on whatever executor we're already on
+    
+    // But this child task runs on the pool's executor:
+    int result = co_await run(pool.get_executor())(expensive_computation());
+    
+    // After co_await, we're back on our original executor
+    co_return result;
+}
 ----
 
-See xref:cancellation.adoc[Cancellation] for details on stop token propagation.
+=== Executor Affinity
 
-== When NOT to Use run_async
+By default, a task inherits its caller's executor. This means completions are dispatched through that executor, ensuring thread affinity for thread-sensitive code.
 
-Use `run_async` when:
+`run` overrides this inheritance for a specific child task, binding it to a different executor. The child task runs on the specified executor, and when it completes, the parent task resumes on its original executor.
 
-* You need to start a coroutine from non-coroutine code
-* You want fire-and-forget semantics
-* You need to receive the result via callback
+This pattern is useful for:
 
-Do NOT use `run_async` when:
+* Running CPU-intensive work on a thread pool
+* Performing I/O on an I/O-specific context
+* Ensuring UI updates happen on the UI thread
 
-* You are already inside a coroutine — just `co_await` the task directly
-* You need the result synchronously — `run_async` is asynchronous
+== Handler Threading
 
-== Summary
+Handlers passed to `run_async` are invoked on whatever thread the executor schedules:
 
-[cols="1,3"]
-|===
-| Pattern | Code
-
-| Fire and forget
-| `run_async(ex)(task)`
+[source,cpp]
+----
+// If pool has 4 threads, the handler runs on one of those threads
+run_async(pool.get_executor(), [](int result) {
+    // This runs on a pool thread, NOT the main thread
+    update_shared_state(result);
+})(compute());
+----
 
-| Success handler
-| `run_async(ex)(task, handler)`
+If you need results on a specific thread, use appropriate synchronization or dispatch mechanisms.
 
-| Success + error handlers
-| `run_async(ex)(task, on_success, on_error)`
+== Reference
 
-| With stop token
-| `run_async(ex, stop_token)(task)`
+[cols="1,3"]
 |===
+| Header | Description
 
-== Next Steps
+| `<boost/capy/ex/run_async.hpp>`
+| Entry point for launching tasks from non-coroutine code
+
+| `<boost/capy/ex/run.hpp>`
+| Executor binding for child tasks within coroutines
+|===
 
-* xref:when-all.adoc[Concurrent Composition] — Run multiple tasks in parallel
-* xref:affinity.adoc[Executor Affinity] — Control where coroutines execute
-* xref:../execution/frame-allocation.adoc[Frame Allocation] — Optimize memory usage
+You have now learned how to launch coroutines using `run_async` and bind child tasks to specific executors using `run`. In the next section, you will learn about executors and execution contexts in detail.
diff --git a/doc/modules/ROOT/pages/coroutines/tasks.adoc b/doc/modules/ROOT/pages/coroutines/tasks.adoc
index b36b68e0..6973c419 100644
--- a/doc/modules/ROOT/pages/coroutines/tasks.adoc
+++ b/doc/modules/ROOT/pages/coroutines/tasks.adoc
@@ -1,85 +1,75 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
+= The task Type
 
-= Tasks
+This section introduces Capy's `task<T>` type—the fundamental coroutine type for asynchronous programming in Capy.
 
-This page explains how to write coroutine functions using `task<T>`.
+== Prerequisites
 
-NOTE: Code snippets assume `using namespace boost::capy;` is in effect.
+* Completed xref:../cpp20-coroutines/advanced.adoc[C++20 Coroutines Tutorial]
+* Understanding of promise types, coroutine handles, and symmetric transfer
 
-== What is a Task?
+== Overview
 
-A `task<T>` represents an asynchronous operation that will produce a value
-of type `T`. Tasks are _lazy_: they do not begin execution when created.
-A task remains suspended until it is either awaited by another coroutine
-or launched explicitly with `run_async`.
+`task<T>` is Capy's primary coroutine return type. It represents an asynchronous operation that eventually produces a value of type `T` (or nothing, for `task<void>`).
 
-This laziness enables structured composition. When you write:
+Key characteristics:
 
-[source,cpp]
-----
-task<void> parent()
-{
-    co_await child();  // child runs here, not when created
-}
-----
-
-The child coroutine starts exactly when the parent awaits it, making the
-control flow predictable.
+* *Lazy execution* — The coroutine does not start until awaited
+* *Symmetric transfer* — Efficient resumption without stack accumulation
+* *Executor inheritance* — Inherits the caller's executor unless explicitly bound
+* *Stop token propagation* — Forward-propagates cancellation signals
+* *HALO support* — Enables heap allocation elision when possible
 
-== Creating Tasks
+== Declaring task Coroutines
 
-Write a coroutine function by using `co_return` or `co_await`:
+Any function that returns `task<T>` and contains coroutine keywords (`co_await`, `co_return`) is a `task` coroutine:
 
 [source,cpp]
 ----
-task<int> compute()
+#include <boost/capy.hpp>
+using namespace boost::capy;
+
+task<int> compute_value()
 {
     co_return 42;
 }
-----
-
-The function `compute()` returns immediately with a suspended coroutine.
-No code inside the function body executes until the task is started.
-
-=== Returning Values
 
-Use `co_return` to produce the task's result:
+task<std::string> fetch_greeting()
+{
+    co_return "Hello, Capy!";
+}
 
-[source,cpp]
-----
-task<std::string> greet(std::string name)
+task<> do_nothing()  // task<void>
 {
-    co_return "Hello, " + name + "!";
+    co_return;
 }
 ----
 
-=== Void Tasks
+The syntax `task<>` is equivalent to `task<void>` and represents a coroutine that completes without producing a value.
 
-For operations that perform work without producing a value, use `task<void>`:
+== Returning Values with co_return
+
+Use `co_return` to complete the coroutine and provide its result:
 
 [source,cpp]
 ----
-task<void> log_message(std::string msg)
+task<int> add(int a, int b)
 {
-    std::cout << msg << std::endl;
-    co_return;
+    int result = a + b;
+    co_return result;  // Completes with value
+}
+
+task<> log_message(std::string msg)
+{
+    std::cout << msg << "\n";
+    co_return;  // Completes without value
 }
 ----
 
-The explicit `co_return;` statement completes the task. Reaching the end of
-the function body has the same effect.
+For `task<void>`, you can either use `co_return;` explicitly or let execution fall off the end of the function body.
 
-== Awaiting Tasks
+== Awaiting Other Tasks
 
-Tasks can await other tasks using `co_await`. The calling coroutine suspends
-until the awaited task completes:
+Tasks can await other tasks using `co_await`. This is the primary mechanism for composing asynchronous operations:
 
 [source,cpp]
 ----
@@ -93,109 +83,142 @@ task<int> step_two(int x)
     co_return x * 2;
 }
 
-task<int> pipeline()
+task<int> full_operation()
 {
-    int a = co_await step_one();
-    int b = co_await step_two(a);
-    co_return a + b;  // 10 + 20 = 30
+    int a = co_await step_one();  // Suspends until step_one completes
+    int b = co_await step_two(a); // Suspends until step_two completes
+    co_return b + 5;              // Final result: 25
 }
 ----
 
-Each `co_await` suspends the current coroutine, starts the child task, and
-resumes when the child completes. The child's return value becomes the result
-of the `co_await` expression.
+When you `co_await` a task:
 
-== Exception Handling
+1. The current coroutine suspends
+2. The awaited task starts executing
+3. When the awaited task completes, the current coroutine resumes
+4. The `co_await` expression evaluates to the awaited task's result
 
-Exceptions thrown within a task are captured and stored. When the task is
-awaited, the exception is rethrown in the awaiting coroutine:
+== Lazy Execution
+
+A critical property of `task<T>` is *lazy execution*: creating a task does not start its execution. The coroutine body runs only when the task is awaited.
 
 [source,cpp]
 ----
-task<int> might_fail()
+task<int> compute()
 {
-    throw std::runtime_error("oops");
-    co_return 0;  // never reached
+    std::cout << "Computing...\n";  // Not printed until awaited
+    co_return 42;
 }
 
-task<void> caller()
+task<> example()
 {
-    try {
-        int x = co_await might_fail();
-    } catch (std::exception const& e) {
-        std::cerr << "Caught: " << e.what() << "\n";
-    }
+    auto t = compute();   // Task created, but "Computing..." NOT printed yet
+    std::cout << "Task created\n";
+    
+    int result = co_await t;  // NOW "Computing..." is printed
+    std::cout << "Result: " << result << "\n";
 }
 ----
 
-This enables natural exception handling across coroutine boundaries.
+*Output:*
+
+----
+Task created
+Computing...
+Result: 42
+----
 
-== Move-Only Semantics
+Lazy execution enables efficient composition—tasks that are never awaited never run, consuming no resources beyond their initial allocation.
 
-Tasks are move-only. You cannot copy a task:
+== Symmetric Transfer
+
+When a task completes, control transfers directly to its continuation (the coroutine that awaited it) using *symmetric transfer*. This avoids stack accumulation even with deep chains of coroutine calls.
+
+Consider:
 
 [source,cpp]
 ----
-task<int> t = compute();
-task<int> t2 = t;             // ERROR: deleted copy constructor
-task<int> t3 = std::move(t);  // OK: move is allowed
+task<> a() { co_await b(); }
+task<> b() { co_await c(); }
+task<> c() { co_return; }
 ----
 
-This reflects the fact that a coroutine has unique state that cannot be
-duplicated.
+Without symmetric transfer, each `co_await` would add a stack frame, potentially causing stack overflow with deep nesting. With symmetric transfer, `c` returning to `b` returning to `a` uses constant stack space regardless of depth.
 
-== Releasing the Handle
-
-In advanced scenarios, you may need direct access to the coroutine handle.
-The `release()` method transfers ownership:
+This is implemented through the `await_suspend` returning a coroutine handle rather than `void`:
 
 [source,cpp]
 ----
-task<int> t = compute();
-auto handle = t.release();  // t no longer owns the coroutine
-// ... use handle directly ...
-handle.destroy();  // caller is responsible for cleanup
+// Inside task's final_suspend awaiter
+coro await_suspend(coro) const noexcept
+{
+    return continuation_;  // Transfer directly to continuation
+}
 ----
 
-WARNING: After calling `release()`, the task is empty and must not be awaited.
+== Move Semantics
 
-== When NOT to Use Tasks
+Tasks are move-only. Copying a task would create aliasing problems where multiple handles reference the same coroutine frame.
 
-Tasks are appropriate when:
+[source,cpp]
+----
+task<int> compute();
 
-* The operation may suspend (performs I/O, awaits other tasks)
-* You want structured composition with parent/child relationships
-* You need lazy evaluation
+task<> example()
+{
+    auto t1 = compute();
+    auto t2 = std::move(t1);  // OK: ownership transferred
+    
+    // auto t3 = t2;  // Error: task is not copyable
+    
+    int result = co_await t2;  // t1 is now empty
+}
+----
 
-Tasks are NOT appropriate when:
+After moving, the source task becomes empty and must not be awaited.
 
-* The operation is purely synchronous — just use a regular function
-* You need parallel execution — tasks are sequential; use `when_all` for concurrency
-* You need to detach and forget — tasks must be awaited or explicitly launched
+== Exception Propagation
 
-== Summary
+Exceptions thrown inside a task are captured and rethrown when the task is awaited:
 
-[cols="1,3"]
-|===
-| Feature | Description
+[source,cpp]
+----
+task<int> might_fail(bool should_fail)
+{
+    if (should_fail)
+        throw std::runtime_error("Operation failed");
+    co_return 42;
+}
 
-| Lazy execution
-| Tasks do not start until awaited or launched
+task<> example()
+{
+    try
+    {
+        int result = co_await might_fail(true);
+    }
+    catch (std::runtime_error const& e)
+    {
+        std::cout << "Caught: " << e.what() << "\n";
+    }
+}
+----
 
-| Move-only
-| Cannot copy, can move
+The exception is stored in the promise when it occurs and rethrown in `await_resume` when the calling coroutine resumes.
 
-| Exception propagation
-| Exceptions rethrow at the await point
+== Reference
 
-| Structured
-| Parent awaits child, control flow is predictable
-|===
+The `task<T>` type is defined in:
 
-== Next Steps
+[source,cpp]
+----
+#include <boost/capy/task.hpp>
+----
+
+Or included via the umbrella header:
 
-Now that you understand tasks, learn how to run them:
+[source,cpp]
+----
+#include <boost/capy.hpp>
+----
 
-* xref:launching.adoc[Launching Tasks] — Start tasks with `run_async`
-* xref:when-all.adoc[Concurrent Composition] — Run tasks in parallel with `when_all`
-* xref:affinity.adoc[Executor Affinity] — Control where tasks execute
+You have now learned how to declare, return values from, and await `task<T>` coroutines. In the next section, you will learn how to launch tasks for execution using `run_async` and `run`.
diff --git a/doc/modules/ROOT/pages/cpp20-coroutines/advanced.adoc b/doc/modules/ROOT/pages/cpp20-coroutines/advanced.adoc
index 17456cff..d3ab5641 100644
--- a/doc/modules/ROOT/pages/cpp20-coroutines/advanced.adoc
+++ b/doc/modules/ROOT/pages/cpp20-coroutines/advanced.adoc
@@ -1,110 +1,82 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
-
 = Part IV: Advanced Topics
 
-This section covers symmetric transfer, allocation strategies, and exception
-handling in coroutines.
+This section covers advanced coroutine topics: symmetric transfer for efficient resumption, coroutine allocation strategies, and exception handling. These concepts are essential for building production-quality coroutine types.
+
+== Prerequisites
+
+* Completed xref:machinery.adoc[Part III: Coroutine Machinery]
+* Understanding of promise types, coroutine handles, and generators
 
 == Symmetric Transfer
 
-Consider what happens when a coroutine completes and needs to resume its
-caller:
+When a coroutine completes or awaits another coroutine, control must transfer somewhere. The naive approach—simply calling `handle.resume()`—has a problem: each nested coroutine adds a frame to the call stack. With deep nesting, you risk stack overflow.
+
+*Symmetric transfer* solves this by returning a coroutine handle from `await_suspend`. Instead of resuming the target coroutine via a function call, the compiler generates a tail call that transfers control without growing the stack.
+
+=== The Problem: Stack Accumulation
+
+Consider a chain of coroutines where each awaits the next:
 
 [source,cpp]
 ----
-void final_awaiter::await_suspend(std::coroutine_handle<> h)
-{
-    auto continuation = h.promise().continuation_;
-    continuation.resume();  // Problem: this is a regular call
-}
+task<> a() { co_await b(); }
+task<> b() { co_await c(); }
+task<> c() { co_return; }
 ----
 
-Each `.resume()` is a function call that uses stack space. In deep coroutine
-chains, this can overflow the stack:
+Without symmetric transfer, when `a` awaits `b`:
 
-----
-main() → task_a.resume() → task_b.resume() → task_c.resume() → ...
-                ↑ stack grows with each resume
-----
+1. `a` calls into the awaiter's `await_suspend`
+2. `await_suspend` calls `b.handle.resume()`
+3. `b` runs, calls into its awaiter's `await_suspend`
+4. That calls `c.handle.resume()`
+5. The stack now has frames for `a`'s suspension, `b`'s suspension, and `c`'s execution
+
+Each suspension adds a stack frame. With thousands of nested coroutines, the stack overflows.
 
-=== The Solution: Return a Handle
+=== The Solution: Return the Handle
 
-When `await_suspend` returns a `coroutine_handle<>`, the compiler uses a
-_tail call_ to resume that handle instead of a nested call:
+`await_suspend` can return a `std::coroutine_handle<>`:
 
 [source,cpp]
 ----
-std::coroutine_handle<> await_suspend(std::coroutine_handle<> h) noexcept
+std::coroutine_handle<> await_suspend(std::coroutine_handle<> h)
 {
-    return h.promise().continuation_;  // Tail call to continuation
+    // store continuation for later
+    continuation_ = h;
+    
+    // return handle to resume (instead of calling resume())
+    return next_coroutine_;
 }
 ----
 
-The compiler transforms this into:
+When `await_suspend` returns a handle, the compiler generates code equivalent to:
 
 [source,cpp]
 ----
-// Conceptual transformation
-while (true)
-{
-    handle = current_coroutine.await_suspend();
-    if (handle == noop_coroutine())
-        break;
-    current_coroutine = handle;
-    // Loop back—no stack growth
-}
+auto next = awaiter.await_suspend(current);
+if (next != std::noop_coroutine())
+    next.resume();  // tail call, doesn't grow stack
 ----
 
-This is called _symmetric transfer_ because control passes directly between
-coroutines without stack accumulation.
+The key insight: returning a handle enables the compiler to implement the resumption as a tail call. The current stack frame is reused for the next coroutine.
 
-=== Avoiding Stack Overflow
+=== Return Types for await_suspend
 
-Compare the stack usage:
+`await_suspend` can return three types:
 
-**Without symmetric transfer:**
-----
-main()
-  └─ coro_a.resume()
-       └─ coro_b.resume()
-            └─ coro_c.resume()
-                 └─ ... (stack overflow eventually)
-----
+`void`::
+Always suspend. The coroutine is suspended and some external mechanism must resume it.
 
-**With symmetric transfer:**
-----
-main() → coro_a → coro_b → coro_c → ...
-         (tail call chain, constant stack)
-----
+`bool`::
+Conditional suspension. Return `true` to suspend, `false` to continue without suspending.
 
-=== When to Use noop_coroutine
+`std::coroutine_handle<>`::
+Symmetric transfer. The returned handle is resumed; returning `std::noop_coroutine()` suspends without resuming anything.
 
-Return `std::noop_coroutine()` when there's nothing to resume:
+=== Using Symmetric Transfer in Generators
 
-[source,cpp]
-----
-std::coroutine_handle<> await_suspend(std::coroutine_handle<> h) noexcept
-{
-    save_for_later(h);
-    if (io_completed_sync_)
-        return h;  // Resume immediately
-    return std::noop_coroutine();  // Return to event loop
-}
-----
-
-`noop_coroutine()` returns a handle that does nothing when resumed—it's the
-signal to exit the symmetric transfer loop.
-
-=== Symmetric Transfer in Practice
-
-Here's a proper `final_suspend` using symmetric transfer:
+A production generator uses symmetric transfer at `final_suspend` to return to whoever is iterating:
 
 [source,cpp]
 ----
@@ -112,362 +84,383 @@ auto final_suspend() noexcept
 {
     struct awaiter
     {
-        std::coroutine_handle<> continuation_;
-
-        bool await_ready() noexcept { return false; }
-
-        std::coroutine_handle<> await_suspend(
-            std::coroutine_handle<>) noexcept
+        promise_type* p_;
+        
+        bool await_ready() const noexcept { return false; }
+        
+        std::coroutine_handle<> await_suspend(std::coroutine_handle<>) noexcept
         {
-            if (continuation_)
-                return continuation_;
-            return std::noop_coroutine();
+            // Return to the consumer that called resume()
+            return p_->consumer_handle_;
         }
-
-        void await_resume() noexcept {}
+        
+        void await_resume() const noexcept {}
     };
-
-    return awaiter{continuation_};
+    return awaiter{this};
 }
 ----
 
 == Coroutine Allocation
 
-By default, coroutine frames are allocated with `::operator new`. For
-performance-critical code, you may need to customize this.
+Every coroutine needs memory for its *coroutine frame*—the heap-allocated structure holding local variables, parameters, and suspension state.
 
-=== The Default: Heap Allocation
-
-[source,cpp]
-----
-task<int> compute()  // Frame allocated with new
-{
-    int x = 42;
-    co_return x;
-}
-----
+=== Default Allocation
 
-The frame includes:
+By default, coroutines allocate their frames using `operator new`. The frame size depends on:
 
-* Local variables
-* Promise object
-* Suspension state
+* Local variables in the coroutine
+* Parameters (copied into the frame)
+* Promise type members
 * Compiler-generated bookkeeping
 
 === Heap Allocation eLision Optimization (HALO)
 
-Compilers can optimize away the allocation when:
+Compilers can sometimes eliminate coroutine frame allocation entirely through *HALO* (Heap Allocation eLision Optimization). When the compiler can prove that:
 
-* The coroutine's lifetime is bounded by the caller
-* The compiler can prove the frame fits in the caller's frame
-* The coroutine isn't stored, passed around, or escaped
+* The coroutine's lifetime is contained within the caller's lifetime
+* The frame size is known at compile time
 
-[source,cpp]
-----
-task<int> leaf()
-{
-    co_return 42;
-}
+...it may allocate the frame on the caller's stack instead of the heap.
 
-task<int> parent()
-{
-    // HALO may place leaf's frame in parent's frame
-    int x = co_await leaf();
-    co_return x;
-}
-----
-
-You can't force HALO, but you can enable it by:
-
-* Using `[[clang::coro_await_elidable]]` on task types (Clang)
-* Keeping coroutine lifetime scoped to the caller
-* Avoiding storing task objects in containers
+HALO is most effective when:
 
-=== Custom Allocation via Promise
-
-Override `operator new` and `operator delete` in the promise type:
+* Coroutines are awaited immediately after creation
+* The coroutine type is marked with `[[clang::coro_await_elidable]]` (Clang extension)
+* Optimization is enabled (`-O2` or higher)
 
 [source,cpp]
 ----
-struct promise_type
-{
-    void* operator new(std::size_t size)
-    {
-        return my_allocator::allocate(size);
-    }
-
-    void operator delete(void* ptr)
-    {
-        my_allocator::deallocate(ptr);
-    }
+// HALO might apply here because the task is awaited immediately
+co_await compute_something();
 
-    // Optional: receive coroutine parameters for allocation decisions
-    void* operator new(std::size_t size, int buffer_size, /* params */)
-    {
-        // Can use parameters to decide allocation strategy
-        return my_allocator::allocate(size);
-    }
-};
+// HALO cannot apply here because the task escapes
+auto task = compute_something();
+store_for_later(std::move(task));
 ----
 
-The compiler passes coroutine function parameters to `operator new` if a
-matching overload exists.
-
-=== Frame Recycling
+=== Custom Allocators
 
-For high-throughput scenarios, recycle frames from a pool:
+Promise types can customize allocation by providing `operator new` and `operator delete`:
 
 [source,cpp]
 ----
-thread_local frame_pool_type frame_pool;
-
 struct promise_type
 {
-    void* operator new(std::size_t size)
+    // Custom allocation
+    static void* operator new(std::size_t size)
     {
-        if (void* p = frame_pool.try_allocate(size))
-            return p;
-        return ::operator new(size);
+        return my_allocator.allocate(size);
     }
-
-    void operator delete(void* ptr, std::size_t size)
+    
+    static void operator delete(void* ptr, std::size_t size)
     {
-        if (!frame_pool.try_deallocate(ptr, size))
-            ::operator delete(ptr, size);
+        my_allocator.deallocate(ptr, size);
     }
+    
+    // ... rest of promise type
 };
 ----
 
-Capy's `frame_allocator` provides this pattern with thread-local recycling.
-
-=== The Allocation Window Problem
-
-Custom allocation has a timing constraint: `operator new` runs before the
-coroutine body, so you can't use runtime state from inside the coroutine:
-
-[source,cpp]
-----
-task<void> work(allocator& alloc)
-{
-    // Problem: alloc isn't available during operator new!
-    co_return;
-}
-----
-
-Solutions:
+The promise's `operator new` receives only the frame size. To access allocator arguments passed to the coroutine, use the leading allocator convention with `std::allocator_arg_t` as the first parameter.
 
-1. **Thread-local state:** Set up allocator before calling coroutine
-2. **Promise parameter overloads:** Pass allocator as function parameter
-3. **Launchers:** Use a launcher function that sets up thread-local state
+== Exception Handling
 
-Capy uses the launcher pattern with `run_async`.
+Exceptions in coroutines require special handling because a coroutine can suspend and resume across different call stacks.
 
-== Exception Handling
+=== The Exception Flow
 
-Exceptions in coroutines have unique characteristics because the coroutine
-frame outlives the call stack where the exception was thrown.
+When an exception is thrown inside a coroutine and not caught:
 
-=== Exception Flow
+1. The exception is caught by an implicit try-catch surrounding the coroutine body
+2. `promise.unhandled_exception()` is called while the exception is active
+3. After `unhandled_exception()` returns, `co_await promise.final_suspend()` executes
+4. The coroutine completes (suspended or destroyed, depending on `final_suspend`)
 
-When an exception escapes a coroutine body:
+=== Options in unhandled_exception()
 
-1. `promise.unhandled_exception()` is called
-2. The coroutine proceeds to `final_suspend()`
-3. The exception is typically rethrown when someone accesses the result
+*Terminate the program:*
 
 [source,cpp]
 ----
-struct promise_type
-{
-    std::exception_ptr exception_;
-
-    void unhandled_exception()
-    {
-        exception_ = std::current_exception();
-    }
-};
-
-// In await_resume:
-T await_resume()
+void unhandled_exception()
 {
-    if (promise().exception_)
-        std::rethrow_exception(promise().exception_);
-    return std::move(*promise().result_);
+    std::terminate();
 }
 ----
 
-=== unhandled_exception() Behavior
-
-You have choices in `unhandled_exception()`:
+*Store for later retrieval:*
 
 [source,cpp]
 ----
-// Store for later propagation
 void unhandled_exception()
 {
     exception_ = std::current_exception();
 }
+----
 
-// Terminate immediately
-void unhandled_exception()
-{
-    std::terminate();
-}
+*Rethrow immediately:*
 
-// Log and ignore
+[source,cpp]
+----
 void unhandled_exception()
 {
-    try { throw; }
-    catch (std::exception const& e) {
-        log_error(e.what());
-    }
+    throw;  // propagates to whoever resumed the coroutine
 }
 ----
 
-Storing exceptions allows natural propagation through `co_await` chains.
-
-=== Exceptions and Symmetric Transfer
-
-When using symmetric transfer, exceptions can propagate across coroutine
-boundaries:
+*Swallow the exception:*
 
 [source,cpp]
 ----
-task<int> child()
-{
-    throw std::runtime_error("oops");
-    co_return 0;
-}
-
-task<void> parent()
+void unhandled_exception()
 {
-    try {
-        int x = co_await child();  // Exception rethrown here
-    } catch (std::exception const& e) {
-        std::cerr << "Caught: " << e.what() << "\n";
-    }
+    // silently ignored - almost always a mistake
 }
 ----
 
-The exception is:
+=== The Store-and-Rethrow Pattern
 
-1. Thrown in `child`
-2. Captured by `child`'s `unhandled_exception()`
-3. `child` completes (reaching final_suspend)
-4. `parent` resumes via symmetric transfer
-5. `parent`'s `await_resume()` rethrows the exception
-6. `parent`'s catch block handles it
-
-=== Coroutine Destruction and Exceptions
-
-If a coroutine is destroyed without being fully executed, destructors run
-but no exception propagates:
+For tasks and generators where callers expect results, store the exception and rethrow it when results are requested:
 
 [source,cpp]
 ----
-task<void> work()
+struct promise_type
 {
-    RAII_Guard guard;  // Constructor runs
-    co_await something();
-    // If task is destroyed here, guard's destructor runs
-}
+    std::exception_ptr exception_;
+    
+    void unhandled_exception()
+    {
+        exception_ = std::current_exception();
+    }
+};
 
-void cancel()
+// In the return object's result accessor:
+T get_result()
 {
-    task<void> t = work();
-    // t destroyed without completion—guard destructor runs
-    // No exception thrown to caller
+    if (handle_.promise().exception_)
+        std::rethrow_exception(handle_.promise().exception_);
+    return std::move(handle_.promise().result_);
 }
 ----
 
-=== final_suspend Must Be noexcept
-
-The `final_suspend()` method must not throw:
+=== Exception Example
 
 [source,cpp]
 ----
-// Correct
-std::suspend_always final_suspend() noexcept { return {}; }
-
-// WRONG—undefined behavior if this throws
-std::suspend_always final_suspend() { return {}; }
-----
+#include <coroutine>
+#include <exception>
+#include <iostream>
+#include <stdexcept>
 
-If `final_suspend` could throw after `unhandled_exception` stored an
-exception, the program's behavior would be undefined.
-
-=== Practical Exception Patterns
+struct Task
+{
+    struct promise_type
+    {
+        std::exception_ptr exception;
+        
+        Task get_return_object()
+        {
+            return Task{std::coroutine_handle<promise_type>::from_promise(*this)};
+        }
+        
+        std::suspend_always initial_suspend() { return {}; }
+        std::suspend_always final_suspend() noexcept { return {}; }
+        void return_void() {}
+        
+        void unhandled_exception()
+        {
+            exception = std::current_exception();
+        }
+    };
+    
+    std::coroutine_handle<promise_type> handle;
+    
+    Task(std::coroutine_handle<promise_type> h) : handle(h) {}
+    ~Task() { if (handle) handle.destroy(); }
+    
+    void run() { handle.resume(); }
+    
+    void check_exception()
+    {
+        if (handle.promise().exception)
+            std::rethrow_exception(handle.promise().exception);
+    }
+};
 
-**Pattern 1: Propagate through await**
-[source,cpp]
-----
-// Exception propagates naturally
-task<void> pipeline()
+Task risky_operation()
 {
-    auto a = co_await step_a();  // Throws if step_a fails
-    auto b = co_await step_b(a);
-    co_return;
+    std::cout << "Starting risky operation" << std::endl;
+    throw std::runtime_error("Something went wrong");
+    co_return;  // never reached
 }
-----
 
-**Pattern 2: Handle locally**
-[source,cpp]
-----
-task<result> safe_fetch()
+int main()
 {
-    try {
-        co_return co_await risky_operation();
-    } catch (network_error const&) {
-        co_return default_result();
+    Task task = risky_operation();
+    
+    try
+    {
+        task.run();
+        task.check_exception();
+        std::cout << "Operation completed successfully" << std::endl;
+    }
+    catch (std::exception const& e)
+    {
+        std::cout << "Operation failed: " << e.what() << std::endl;
     }
 }
 ----
 
-**Pattern 3: Convert to error code**
-[source,cpp]
+*Output:*
+
 ----
-task<io_result<data>> fetch()
-{
-    try {
-        auto d = co_await do_fetch();
-        co_return {error_code{}, d};
-    } catch (std::exception const& e) {
-        co_return {make_error_code(errc::operation_failed), {}};
-    }
-}
+Starting risky operation
+Operation failed: Something went wrong
 ----
 
-== Summary
+=== Initialization Exceptions
+
+Exceptions thrown before the first suspension point (before `initial_suspend` completes) propagate directly to the caller without going through `unhandled_exception()`. If `initial_suspend()` returns `suspend_always`, the coroutine suspends before any user code runs, avoiding this edge case.
 
-[cols="1,3"]
-|===
-| Topic | Key Points
+== Building a Production Generator
 
-| Symmetric transfer
-| Return `coroutine_handle` from `await_suspend` for O(1) stack usage
+With all these concepts, here is a production-quality generic generator:
 
-| noop_coroutine
-| Signals "nothing to resume" in symmetric transfer
+[source,cpp]
+----
+#include <coroutine>
+#include <exception>
+#include <utility>
 
-| Frame allocation
-| Default heap; customize via promise `operator new`
+template<typename T>
+class Generator
+{
+public:
+    struct promise_type
+    {
+        T value_;
+        std::exception_ptr exception_;
+        
+        Generator get_return_object()
+        {
+            return Generator{Handle::from_promise(*this)};
+        }
+        
+        std::suspend_always initial_suspend() noexcept { return {}; }
+        std::suspend_always final_suspend() noexcept { return {}; }
+        
+        std::suspend_always yield_value(T v)
+        {
+            value_ = std::move(v);
+            return {};
+        }
+        
+        void return_void() noexcept {}
+        
+        void unhandled_exception()
+        {
+            exception_ = std::current_exception();
+        }
+        
+        // Prevent co_await inside generators
+        template<typename U>
+        std::suspend_never await_transform(U&&) = delete;
+    };
+    
+    using Handle = std::coroutine_handle<promise_type>;
+    
+    class iterator
+    {
+        Handle handle_;
+        
+    public:
+        using iterator_category = std::input_iterator_tag;
+        using value_type = T;
+        using difference_type = std::ptrdiff_t;
+        
+        iterator() : handle_(nullptr) {}
+        explicit iterator(Handle h) : handle_(h) {}
+        
+        iterator& operator++()
+        {
+            handle_.resume();
+            if (handle_.done())
+            {
+                auto& promise = handle_.promise();
+                handle_ = nullptr;
+                if (promise.exception_)
+                    std::rethrow_exception(promise.exception_);
+            }
+            return *this;
+        }
+        
+        T& operator*() const { return handle_.promise().value_; }
+        bool operator==(iterator const& other) const
+        {
+            return handle_ == other.handle_;
+        }
+    };
+    
+    iterator begin()
+    {
+        if (handle_)
+        {
+            handle_.resume();
+            if (handle_.done())
+            {
+                auto& promise = handle_.promise();
+                if (promise.exception_)
+                    std::rethrow_exception(promise.exception_);
+                return iterator{};
+            }
+        }
+        return iterator{handle_};
+    }
+    
+    iterator end() { return iterator{}; }
+    
+    ~Generator() { if (handle_) handle_.destroy(); }
+    
+    Generator(Generator const&) = delete;
+    Generator& operator=(Generator const&) = delete;
+    
+    Generator(Generator&& other) noexcept
+        : handle_(std::exchange(other.handle_, nullptr)) {}
+    
+    Generator& operator=(Generator&& other) noexcept
+    {
+        if (this != &other)
+        {
+            if (handle_) handle_.destroy();
+            handle_ = std::exchange(other.handle_, nullptr);
+        }
+        return *this;
+    }
 
-| HALO
-| Compiler may elide allocation for scoped coroutines
+private:
+    Handle handle_;
+    
+    explicit Generator(Handle h) : handle_(h) {}
+};
+----
 
-| Frame recycling
-| Pool allocators for high-throughput scenarios
+This generator:
 
-| Exception handling
-| Stored in `unhandled_exception()`, rethrown in `await_resume()`
+* Provides a standard iterator interface for range-based for loops
+* Stores and rethrows exceptions during iteration
+* Prevents `co_await` inside generators via deleted `await_transform`
+* Manages coroutine lifetime with RAII
+* Supports move semantics
 
-| final_suspend
-| Must be `noexcept`—can't throw after exception capture
-|===
+== Conclusion
 
-== Next Steps
+You have now learned the complete mechanics of C++20 coroutines:
 
-You now understand C++20 coroutines. Learn how Capy builds on this foundation:
+* *Keywords* — `co_await`, `co_yield`, and `co_return` transform functions into coroutines
+* *Promise types* — Control coroutine behavior at initialization, suspension, completion, and error handling
+* *Coroutine handles* — Lightweight references for resuming, querying, and destroying coroutines
+* *Symmetric transfer* — Efficient control flow without stack accumulation
+* *Allocation* — Custom allocation and HALO optimization
+* *Exception handling* — Capturing and propagating exceptions across suspension points
 
-* xref:../io-awaitables/concepts.adoc[I/O Awaitables] — The affine awaitable protocol
-* xref:../library/task.adoc[The task<T> Type] — Capy's implementation
+These fundamentals prepare you for understanding Capy's `task<T>` type and the IoAwaitable protocol, which build on standard coroutine machinery with executor affinity and stop token propagation.
diff --git a/doc/modules/ROOT/pages/cpp20-coroutines/foundations.adoc b/doc/modules/ROOT/pages/cpp20-coroutines/foundations.adoc
index 078c8ba1..3893d217 100644
--- a/doc/modules/ROOT/pages/cpp20-coroutines/foundations.adoc
+++ b/doc/modules/ROOT/pages/cpp20-coroutines/foundations.adoc
@@ -1,261 +1,145 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
-
 = Part I: Foundations
 
-This section builds intuition for coroutines by examining what makes them
-different from regular functions.
+This section introduces the fundamental concepts you need before working with C++20 coroutines. You will learn how normal functions work, what makes coroutines different, and why coroutines exist as a language feature.
 
-== Functions and the Call Stack
+== Prerequisites
 
-A regular function follows a strict discipline: when called, it runs to
-completion before control returns to the caller.
+Before beginning this tutorial, you should have:
 
-[source,cpp]
-----
-void foo()
-{
-    int x = 1;    // Stack frame allocated
-    bar();        // Call bar, wait for it to return
-    x += 1;       // bar is done, continue
-}                 // Stack frame released
-----
+* A C++ compiler with C++20 support (GCC 10+, Clang 14+, or MSVC 2019 16.8+)
+* Familiarity with basic C++ concepts: functions, classes, templates, and lambdas
+* Understanding of how function calls work: the call stack, local variables, and return values
 
-The call stack enforces this discipline. Each function call pushes a new
-_stack frame_ containing:
+The examples in this tutorial use standard C++20 features. Compile with:
 
-* Local variables
-* Return address (where to resume the caller)
-* Saved registers
+* GCC: `g++ -std=c++20 -fcoroutines your_file.cpp`
+* Clang: `clang++ -std=c++20 your_file.cpp`
+* MSVC: `cl /std:c++20 your_file.cpp`
 
-When a function returns, its frame is popped and execution continues at the
-saved return address. The stack grows and shrinks in strict LIFO order.
+== Functions and the Call Stack
 
-=== The Limitation: Run-to-Completion
+When you call a regular function, the system allocates space on the *call stack* for the function's local variables and parameters. This stack space is called a *stack frame*. When the function returns, this stack space is reclaimed. The function's state exists only during the call.
 
-This model works well for synchronous code but creates problems for
-asynchronous operations:
+Consider this function:
 
 [source,cpp]
 ----
-void read_file()
+int compute(int x, int y)
 {
-    auto result = start_async_read(file);  // Returns immediately
-    // But we can't return here—we need result!
-    process(result);  // result isn't ready yet
+    int result = x * y + 42;
+    return result;
 }
 ----
 
-Without coroutines, you're forced to restructure your code:
+When `compute` is called:
 
-* **Callbacks:** Break the logic across multiple functions
-* **State machines:** Manually track progress through states
-* **Blocking:** Hold up the thread waiting for I/O
+1. A stack frame is allocated containing `x`, `y`, and `result`
+2. The function body executes
+3. The return value is passed back to the caller
+4. The stack frame is deallocated
 
-Each approach has costs. Callbacks scatter logic and complicate error handling.
-State machines are verbose. Blocking wastes threads.
+This model has a fundamental constraint: *run-to-completion*. Once a function starts, it must finish before control returns to the caller. The function cannot pause midway, let other code run, and resume later.
 
 == What Is a Coroutine?
 
-A coroutine is a function that can _suspend_ its execution and later _resume_
-from where it left off. Unlike regular functions, coroutines don't require
-run-to-completion semantics.
-
-[source,cpp]
-----
-task<int> async_compute()
-{
-    int x = 1;
-    co_await some_operation();  // Suspend here
-    x += 1;                      // Resume here later
-    co_return x;
-}
-----
+A *coroutine* is a function that can suspend its execution and resume later from exactly where it left off. Think of it as a bookmark in a book of instructions—instead of reading the entire book in one sitting, you can mark your place, do something else, and return to continue reading.
 
 When a coroutine suspends:
 
-1. Its local variables are preserved (not on the call stack)
-2. Control returns to whoever resumed the coroutine
-3. The coroutine can be resumed later, possibly on a different thread
+* Its local variables are preserved
+* The instruction pointer (where you are in the code) is saved
+* Control returns to the caller or some other code
 
-This is fundamentally different from a function call. A coroutine's state
-persists independently of the call stack.
+When a coroutine resumes:
 
-=== The Coroutine Frame
+* Local variables are restored to their previous values
+* Execution continues from the suspension point
 
-Since local variables must survive suspension, they can't live on the stack.
-Instead, the compiler allocates a _coroutine frame_ (typically on the heap)
-that holds:
-
-* All local variables
-* The current suspension point
-* The promise object (more on this later)
-
-----
-+-------------------+
-| Coroutine Frame   |
-+-------------------+
-| locals: x, y      |
-| suspension point  |
-| promise object    |
-+-------------------+
-----
-
-When you call a coroutine function, it returns immediately with a handle to
-this frame. The actual work happens when someone resumes the coroutine.
-
-=== Cooperative Multitasking
-
-Coroutines enable _cooperative multitasking_: multiple logical tasks can
-interleave their execution without preemptive thread switching.
+This capability is implemented through a *coroutine frame*—a heap-allocated block of memory that stores the coroutine's state. Unlike stack frames, coroutine frames persist across suspension points because they live on the heap rather than the stack.
 
+[source,cpp]
 ----
-Task A: [work] → suspend → [work] → suspend → [work] → done
-Task B:          [work] → suspend → [work] → done
-Task C:                    [work] → suspend → [work] → done
+// Conceptual illustration (not real syntax)
+task<int> fetch_and_process()
+{
+    auto data = co_await fetch_from_network();  // suspends here
+    // When resumed, 'data' contains the fetched result
+    return process(data);
+}
 ----
 
-Each task voluntarily yields control at suspension points. This is cheaper
-than thread context switches and eliminates many concurrency hazards since
-tasks don't preempt each other mid-operation.
+The variable `data` maintains its value even though the function may have suspended and resumed. This is the fundamental capability that coroutines provide.
 
 == Why Coroutines?
 
-Coroutines address three categories of problems elegantly.
+=== The Problem: Asynchronous Programming Without Coroutines
 
-=== Asynchronous Programming Without Callbacks
+Consider a server application that handles network requests. The server must read a request, query a database, compute a response, and send it back. Each step might take time to complete.
 
-Traditional async code scatters logic across callbacks:
+In traditional synchronous code:
 
 [source,cpp]
 ----
-// Callback hell
-void fetch_data()
+void handle_request(connection& conn)
 {
-    http_get(url, [](response r1) {
-        parse(r1, [](data d) {
-            http_get(next_url(d), [](response r2) {
-                process(r2, [](result res) {
-                    complete(res);
-                });
-            });
-        });
-    });
-}
-----
-
-With coroutines, the same logic reads sequentially:
-
-[source,cpp]
-----
-task<void> fetch_data()
-{
-    auto r1 = co_await http_get(url);
-    auto d = co_await parse(r1);
-    auto r2 = co_await http_get(next_url(d));
-    auto res = co_await process(r2);
-    complete(res);
+    std::string request = conn.read();      // blocks until data arrives
+    auto parsed = parse_request(request);
+    auto data = database.query(parsed.id);  // blocks until database responds
+    auto response = compute_response(data);
+    conn.write(response);                   // blocks until write completes
 }
 ----
 
-Error handling becomes natural again—you can use try/catch instead of
-checking error codes in every callback.
+This code reads naturally from top to bottom. But while waiting for the network or database, this function blocks the entire thread. If you have thousands of concurrent connections, you would need thousands of threads, each consuming memory and requiring operating system scheduling.
 
-=== Generators and Lazy Sequences
+=== The Callback Approach
 
-Coroutines can produce values on demand:
+The traditional solution uses callbacks:
 
 [source,cpp]
 ----
-generator<int> fibonacci()
-{
-    int a = 0, b = 1;
-    while (true)
-    {
-        co_yield a;
-        int next = a + b;
-        a = b;
-        b = next;
-    }
-}
-
-// Consume lazily
-for (int n : fibonacci())
+void handle_request(connection& conn)
 {
-    if (n > 1000) break;
-    std::cout << n << "\n";
+    conn.async_read([&conn](std::string request) {
+        auto parsed = parse_request(request);
+        database.async_query(parsed.id, [&conn](auto data) {
+            auto response = compute_response(data);
+            conn.async_write(response, [&conn]() {
+                // request complete
+            });
+        });
+    });
 }
 ----
 
-The generator produces values one at a time, suspending between each yield.
-No infinite vector is allocated—values are computed on demand.
+This code does not block. Each operation starts, registers a callback, and returns immediately. When the operation completes, the callback runs.
+
+But look what has happened to the code: three levels of nesting, logic scattered across multiple lambda functions, and local variables that cannot be shared between callbacks without careful lifetime management. A single logical operation becomes fragmented across multiple functions.
 
-=== State Machines Made Simple
+=== The Coroutine Solution
 
-Complex state machines become linear code:
+Coroutines restore linear code structure while maintaining asynchronous behavior:
 
 [source,cpp]
 ----
-// HTTP request parser as a coroutine
-task<request> parse_http()
+task<void> handle_request(connection& conn)
 {
-    auto method = co_await read_until(' ');
-    auto path = co_await read_until(' ');
-    auto version = co_await read_line();
-
-    headers_map headers;
-    while (true)
-    {
-        auto line = co_await read_line();
-        if (line.empty()) break;
-        headers.insert(parse_header(line));
-    }
-
-    std::string body;
-    if (auto len = headers.find("Content-Length"); len != headers.end())
-        body = co_await read_exactly(std::stoi(len->second));
-
-    co_return request{method, path, version, headers, body};
+    std::string request = co_await conn.async_read();
+    auto parsed = parse_request(request);
+    auto data = co_await database.async_query(parsed.id);
+    auto response = compute_response(data);
+    co_await conn.async_write(response);
 }
 ----
 
-Without coroutines, you'd need explicit state tracking:
-
-[source,cpp]
-----
-enum class State { METHOD, PATH, VERSION, HEADERS, BODY };
-State state_ = State::METHOD;
-std::string buffer_;
-// ... hundreds of lines of state machine code ...
-----
-
-== Summary
-
-[cols="1,3"]
-|===
-| Concept | Description
-
-| Call stack
-| LIFO structure that enforces run-to-completion
-
-| Coroutine
-| Function that can suspend and resume
-
-| Coroutine frame
-| Heap-allocated state that survives suspension
+This code reads like the original blocking version. Local variables like `request`, `parsed`, and `data` exist naturally in their scope. Yet the function suspends at each `co_await` point, allowing other work to proceed while waiting.
 
-| Cooperative multitasking
-| Tasks voluntarily yield, interleaving without preemption
-|===
+=== Beyond Asynchrony
 
-== Next Steps
+Coroutines also enable:
 
-Now that you understand _why_ coroutines exist, learn the syntax:
+* *Generators* — Functions that produce sequences of values on demand, computing each value only when requested
+* *State machines* — Complex control flow expressed as linear code with suspension points
+* *Cooperative multitasking* — Multiple logical tasks interleaved on a single thread
 
-* xref:syntax.adoc[Part II: C++20 Syntax] — The three keywords and awaitables
+You have now learned what coroutines are and why they exist. In the next section, you will learn the C++20 syntax for creating coroutines.
diff --git a/doc/modules/ROOT/pages/cpp20-coroutines/machinery.adoc b/doc/modules/ROOT/pages/cpp20-coroutines/machinery.adoc
index 13cccd6b..6f337582 100644
--- a/doc/modules/ROOT/pages/cpp20-coroutines/machinery.adoc
+++ b/doc/modules/ROOT/pages/cpp20-coroutines/machinery.adoc
@@ -1,531 +1,358 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
-
 = Part III: Coroutine Machinery
 
-This section explains the promise type and coroutine handle—the machinery
-that makes coroutines work.
+This section explains the promise type and coroutine handle—the core machinery that controls coroutine behavior. You will build a complete generator type by understanding how these pieces work together.
 
-== The Promise Type
+== Prerequisites
 
-Every coroutine has an associated _promise type_ that controls its behavior.
-The compiler finds the promise type through the coroutine's return type:
+* Completed xref:syntax.adoc[Part II: C++20 Syntax]
+* Understanding of the three coroutine keywords
+* Familiarity with awaitables and awaiters
 
-[source,cpp]
-----
-struct MyTask
-{
-    struct promise_type { /* ... */ };  // <-- Compiler looks here
-    // ...
-};
+== The Promise Type
 
-MyTask my_coroutine()  // Return type determines promise_type
-{
-    co_return;
-}
-----
+Every coroutine has an associated *promise type*. This type acts as a controller for the coroutine, defining how it behaves at key points in its lifecycle. The promise type is not something you pass to the coroutine—it is a nested type inside the coroutine's return type that the compiler uses automatically.
 
-The promise object lives in the coroutine frame and acts as a communication
-channel between the coroutine and the outside world.
+The compiler expects to find a type named `promise_type` nested inside your coroutine's return type. If your coroutine returns `Generator<int>`, the compiler looks for `Generator<int>::promise_type`.
 
-=== Required Promise Methods
+=== Required Methods
 
-A promise type must provide these methods:
+The promise type must provide these methods:
 
-[source,cpp]
-----
-struct promise_type
-{
-    // Create the return object (called immediately)
-    ReturnType get_return_object();
+`get_return_object()`::
+Called to create the object that will be returned to the caller of the coroutine. This happens before the coroutine body begins executing.
 
-    // Should we suspend at the start?
-    Awaiter initial_suspend();
+`initial_suspend()`::
+Called immediately after `get_return_object()`. Returns an awaiter that determines whether the coroutine should suspend before running any of its body. Return `std::suspend_never{}` to start executing immediately, or `std::suspend_always{}` to suspend before the first statement.
 
-    // Should we suspend at the end?
-    Awaiter final_suspend() noexcept;
+`final_suspend()`::
+Called when the coroutine completes (either normally or via exception). Returns an awaiter that determines whether to suspend one last time or destroy the coroutine state immediately. This method must be `noexcept`.
 
-    // Handle co_return; (void coroutines)
-    void return_void();
-    // OR handle co_return value; (non-void coroutines)
-    void return_value(T value);
+`return_void()` or `return_value(v)`::
+Called when the coroutine executes `co_return` or falls off the end of its body. Use `return_void()` if the coroutine does not return a value; use `return_value(v)` if it does. You must provide exactly one of these, matching how your coroutine returns.
 
-    // Handle exceptions
-    void unhandled_exception();
-};
-----
+`unhandled_exception()`::
+Called if an exception escapes the coroutine body. Typically you either rethrow the exception, store it for later, or terminate the program.
 
-==== get_return_object()
+=== The Compiler Transformation
 
-Called before the coroutine body runs. Returns the object that the coroutine
-function call produces:
+The compiler transforms your coroutine body into something resembling this pseudocode:
 
 [source,cpp]
 ----
-MyTask get_return_object()
 {
-    return MyTask{
-        std::coroutine_handle<promise_type>::from_promise(*this)
-    };
+    promise_type promise;
+    auto return_object = promise.get_return_object();
+    
+    co_await promise.initial_suspend();
+    
+    try {
+        // your coroutine body goes here
+    }
+    catch (...) {
+        promise.unhandled_exception();
+    }
+    
+    co_await promise.final_suspend();
 }
+// coroutine frame is destroyed when control flows off the end
 ----
 
-This is how the caller gets a handle to interact with the coroutine.
-
-==== initial_suspend()
-
-Called immediately after the promise is constructed. Returns an awaiter that
-determines whether the coroutine starts running or suspends immediately:
-
-[source,cpp]
-----
-// Eager: start running immediately
-std::suspend_never initial_suspend() { return {}; }
-
-// Lazy: suspend until someone resumes
-std::suspend_always initial_suspend() { return {}; }
-----
-
-Lazy coroutines (returning `suspend_always`) are easier to compose because
-you control exactly when they start.
+Important observations:
 
-==== final_suspend()
+* The return object is created before `initial_suspend()` runs, so it is available even if the coroutine suspends immediately
+* `final_suspend()` determines whether the coroutine frame persists after completion—if it returns `suspend_always`, you must manually destroy the coroutine; if it returns `suspend_never`, the frame is destroyed automatically
 
-Called after `co_return` or when the coroutine body ends. Returns an awaiter:
+=== Tracing Promise Behavior
 
 [source,cpp]
 ----
-// Suspend at end: caller must destroy the frame
-std::suspend_always final_suspend() noexcept { return {}; }
-
-// Don't suspend: frame is destroyed automatically
-std::suspend_never final_suspend() noexcept { return {}; }
-----
-
-If you suspend at final, the caller can access the result before the frame
-is destroyed. This is required for returning values from coroutines.
-
-NOTE: `final_suspend()` must be `noexcept`. Throwing here is undefined behavior.
-
-==== return_void() / return_value()
+#include <coroutine>
+#include <iostream>
 
-Handle `co_return` statements:
-
-[source,cpp]
-----
-// For coroutines that don't return a value
-void return_void() {}
-
-// For coroutines that return a value
-void return_value(int value)
+struct TracePromise
 {
-    result_ = value;
-}
-----
-
-You provide one or the other, never both. The choice depends on whether your
-coroutine produces a result.
-
-==== unhandled_exception()
-
-Called when an exception escapes the coroutine body:
+    struct promise_type
+    {
+        promise_type()
+        {
+            std::cout << "promise constructed" << std::endl;
+        }
+        
+        ~promise_type()
+        {
+            std::cout << "promise destroyed" << std::endl;
+        }
+        
+        TracePromise get_return_object()
+        {
+            std::cout << "get_return_object called" << std::endl;
+            return {};
+        }
+        
+        std::suspend_never initial_suspend()
+        {
+            std::cout << "initial_suspend called" << std::endl;
+            return {};
+        }
+        
+        std::suspend_always final_suspend() noexcept
+        {
+            std::cout << "final_suspend called" << std::endl;
+            return {};
+        }
+        
+        void return_void()
+        {
+            std::cout << "return_void called" << std::endl;
+        }
+        
+        void unhandled_exception()
+        {
+            std::cout << "unhandled_exception called" << std::endl;
+        }
+    };
+};
 
-[source,cpp]
-----
-void unhandled_exception()
+TracePromise trace_coroutine()
 {
-    exception_ = std::current_exception();  // Store for later
+    std::cout << "coroutine body begins" << std::endl;
+    co_return;
 }
 
-// Or terminate immediately
-void unhandled_exception()
+int main()
 {
-    std::terminate();
+    std::cout << "calling coroutine" << std::endl;
+    auto result = trace_coroutine();
+    std::cout << "coroutine returned" << std::endl;
 }
 ----
 
-Storing the exception allows the awaiting coroutine to handle it when it
-tries to get the result.
-
-=== Optional Promise Methods
-
-Several optional methods customize behavior:
-
-[source,cpp]
-----
-struct promise_type
-{
-    // Customize operator new for the coroutine frame
-    void* operator new(std::size_t size);
-    void operator delete(void* ptr);
-
-    // Transform awaited expressions
-    template<typename T>
-    auto await_transform(T&& expr);
-
-    // Handle co_yield
-    auto yield_value(T value);
-};
-----
-
-==== await_transform()
-
-Intercepts every `co_await` expression in the coroutine. This enables:
-
-* Injecting context (executor, cancellation token) into awaitables
-* Prohibiting certain awaitable types
-* Logging or debugging
+*Output:*
 
-[source,cpp]
 ----
-template<typename T>
-auto await_transform(T&& awaitable)
-{
-    // Wrap to inject our executor
-    return wrapped_awaitable{std::forward<T>(awaitable), executor_};
-}
+calling coroutine
+promise constructed
+get_return_object called
+initial_suspend called
+coroutine body begins
+return_void called
+final_suspend called
+coroutine returned
 ----
 
-==== yield_value()
+Notice that the promise is constructed first, then `get_return_object()` creates the return value, then `initial_suspend()` runs. Since `initial_suspend()` returns `suspend_never`, the coroutine body executes immediately. After `co_return`, `return_void()` is called, followed by `final_suspend()`. Since `final_suspend()` returns `suspend_always`, the coroutine suspends one last time, and the promise is not destroyed until the coroutine handle is explicitly destroyed.
 
-Handles `co_yield` expressions. Returns an awaiter:
-
-[source,cpp]
-----
-std::suspend_always yield_value(int value)
-{
-    current_value_ = value;
-    return {};  // Always suspend after yielding
-}
-----
-
-`co_yield x` is equivalent to `co_await promise.yield_value(x)`.
+[WARNING]
+====
+If your coroutine can fall off the end of its body without executing `co_return`, and your promise type lacks a `return_void()` method, the behavior is undefined. Always ensure your promise type has `return_void()` if there is any code path that might reach the end of the coroutine body without an explicit `co_return`.
+====
 
 == Coroutine Handle
 
-The `std::coroutine_handle<P>` is a lightweight handle to a coroutine frame.
-It's similar to a raw pointer—cheap to copy, but doesn't manage lifetime.
+A `std::coroutine_handle<>` is a lightweight object that refers to a suspended coroutine. It is similar to a pointer: it does not own the memory it references, and copying it does not copy the coroutine.
 
 === Basic Operations
 
-[source,cpp]
-----
-std::coroutine_handle<promise_type> h;
+* `handle()` or `handle.resume()` — Resume the coroutine
+* `handle.done()` — Returns `true` if the coroutine has completed
+* `handle.destroy()` — Destroy the coroutine frame (frees memory)
+* `handle.promise()` — Returns a reference to the promise object (typed handles only)
 
-// Resume the coroutine
-h.resume();
-h();         // Same as resume()
+=== Typed vs Untyped Handles
 
-// Destroy the coroutine frame
-h.destroy();
+`std::coroutine_handle<>`::
+The most basic form (equivalent to `std::coroutine_handle<void>`). Can reference any coroutine but provides no access to the promise object.
 
-// Check if at final suspension
-bool done = h.done();
+`std::coroutine_handle<PromiseType>`::
+A typed handle that knows about a particular promise type. Can be converted to the void handle. Provides a `promise()` method that returns a reference to the promise object.
 
-// Access the promise
-promise_type& p = h.promise();
+=== Creating Handles from Promises
 
-// Get from promise
-h = std::coroutine_handle<promise_type>::from_promise(promise);
+Inside `get_return_object()`, you can obtain the coroutine handle using:
 
-// Type-erased handle (loses promise access)
-std::coroutine_handle<> erased = h;
+[source,cpp]
+----
+std::coroutine_handle<promise_type>::from_promise(*this)
 ----
 
-=== Handle vs Pointer
-
-Like a raw pointer:
-
-* Cheap to copy (typically one pointer)
-* Doesn't own the resource
-* Can be null (default-constructed)
-* Can dangle if frame is destroyed
+Since `get_return_object()` is called on the promise object (as `this`), this method returns a handle to the coroutine containing that promise.
 
-Unlike a raw pointer:
+== Putting It Together: Building a Generator
 
-* `destroy()` instead of `delete`
-* `resume()` instead of dereferencing
-* Type parameter gives access to promise
+A *generator* is a function that produces a sequence of values on demand. Instead of computing all values upfront, a generator computes each value when requested using `co_yield`.
 
-=== coroutine_handle<void>
+=== How co_yield Works
 
-The type-erased variant `std::coroutine_handle<>` (alias for
-`coroutine_handle<void>`) can refer to any coroutine:
+The expression `co_yield value` is transformed by the compiler into:
 
 [source,cpp]
 ----
-void resume_any(std::coroutine_handle<> h)
-{
-    h.resume();  // Works for any coroutine
-    // h.promise() is not available—we don't know the promise type
-}
+co_await promise.yield_value(value)
 ----
 
-This is useful for schedulers that don't need promise access.
+The `yield_value` method receives the yielded value, stores it somewhere accessible, and returns an awaiter (usually `std::suspend_always`) to suspend the coroutine.
 
-=== Noop Coroutine
-
-`std::noop_coroutine()` returns a handle to a coroutine that does nothing
-when resumed:
+=== Complete Generator Implementation
 
 [source,cpp]
 ----
-std::coroutine_handle<> await_suspend(std::coroutine_handle<> h)
-{
-    save_continuation(h);
-    if (completed_synchronously_)
-        return h;  // Resume immediately
-    return std::noop_coroutine();  // Don't resume anything
-}
-----
-
-This is essential for symmetric transfer patterns.
-
-== Putting It Together
-
-Let's build a complete `task<T>` type step by step.
+#include <coroutine>
+#include <iostream>
 
-=== Step 1: Basic Structure
-
-[source,cpp]
-----
-template<typename T>
-struct task
+struct Generator
 {
-    struct promise_type;
-
-    std::coroutine_handle<promise_type> handle_;
-
-    explicit task(std::coroutine_handle<promise_type> h) : handle_(h) {}
-
-    ~task()
+    struct promise_type
     {
-        if (handle_)
-            handle_.destroy();
-    }
-
-    // Move-only
-    task(task&& other) noexcept
-        : handle_(std::exchange(other.handle_, nullptr)) {}
-    task& operator=(task&&) = delete;
-    task(task const&) = delete;
-};
-----
-
-=== Step 2: Promise Type
-
-[source,cpp]
-----
-template<typename T>
-struct task<T>::promise_type
-{
-    std::optional<T> result_;
-    std::exception_ptr exception_;
-
-    task get_return_object()
+        int current_value;
+        
+        Generator get_return_object()
+        {
+            return Generator{
+                std::coroutine_handle<promise_type>::from_promise(*this)
+            };
+        }
+        
+        std::suspend_always initial_suspend() { return {}; }
+        std::suspend_always final_suspend() noexcept { return {}; }
+        
+        std::suspend_always yield_value(int value)
+        {
+            current_value = value;
+            return {};
+        }
+        
+        void return_void() {}
+        void unhandled_exception() { std::terminate(); }
+    };
+    
+    std::coroutine_handle<promise_type> handle;
+    
+    Generator(std::coroutine_handle<promise_type> h) : handle(h) {}
+    
+    ~Generator()
     {
-        return task{
-            std::coroutine_handle<promise_type>::from_promise(*this)
-        };
+        if (handle)
+            handle.destroy();
     }
-
-    std::suspend_always initial_suspend() { return {}; }  // Lazy
-    std::suspend_always final_suspend() noexcept { return {}; }
-
-    void return_value(T value)
+    
+    // Disable copying
+    Generator(Generator const&) = delete;
+    Generator& operator=(Generator const&) = delete;
+    
+    // Enable moving
+    Generator(Generator&& other) noexcept
+        : handle(other.handle)
     {
-        result_ = std::move(value);
+        other.handle = nullptr;
     }
-
-    void unhandled_exception()
+    
+    Generator& operator=(Generator&& other) noexcept
     {
-        exception_ = std::current_exception();
+        if (this != &other)
+        {
+            if (handle)
+                handle.destroy();
+            handle = other.handle;
+            other.handle = nullptr;
+        }
+        return *this;
     }
-};
-----
-
-=== Step 3: Awaitable Interface
-
-[source,cpp]
-----
-template<typename T>
-struct task
-{
-    // ... previous code ...
-
-    bool await_ready() const { return false; }  // Always suspend
-
-    std::coroutine_handle<> await_suspend(std::coroutine_handle<> caller)
+    
+    bool next()
     {
-        // Store caller so we can resume them when we complete
-        handle_.promise().continuation_ = caller;
-        return handle_;  // Start running this task
+        if (!handle || handle.done())
+            return false;
+        handle.resume();
+        return !handle.done();
     }
-
-    T await_resume()
+    
+    int value() const
     {
-        if (handle_.promise().exception_)
-            std::rethrow_exception(handle_.promise().exception_);
-        return std::move(*handle_.promise().result_);
+        return handle.promise().current_value;
     }
 };
-----
-
-=== Step 4: Continuation Handling
-
-Update promise to resume the awaiting coroutine:
 
-[source,cpp]
-----
-template<typename T>
-struct task<T>::promise_type
+Generator count_to(int n)
 {
-    std::coroutine_handle<> continuation_;
-    // ... other members ...
-
-    auto final_suspend() noexcept
+    for (int i = 1; i <= n; ++i)
     {
-        struct final_awaiter
-        {
-            bool await_ready() noexcept { return false; }
-
-            std::coroutine_handle<> await_suspend(
-                std::coroutine_handle<promise_type> h) noexcept
-            {
-                // Resume whoever was waiting for us
-                return h.promise().continuation_;
-            }
-
-            void await_resume() noexcept {}
-        };
-        return final_awaiter{};
+        co_yield i;
     }
-};
-----
-
-=== Step 5: Usage
-
-[source,cpp]
-----
-task<int> inner()
-{
-    co_return 42;
 }
 
-task<int> outer()
+int main()
 {
-    int x = co_await inner();  // Suspends, runs inner, resumes with result
-    co_return x * 2;
+    auto gen = count_to(5);
+    
+    while (gen.next())
+    {
+        std::cout << gen.value() << std::endl;
+    }
 }
 ----
 
-The flow:
+*Output:*
 
-1. `outer()` creates a suspended task
-2. Someone resumes `outer`
-3. `outer` hits `co_await inner()`
-4. `outer` suspends, storing itself as continuation
-5. `inner` runs, returns 42
-6. `inner` at final_suspend resumes `outer`
-7. `outer` resumes with `x = 42`
-8. `outer` returns 84
-
-=== Building a Generator
-
-Generators use `co_yield` to produce sequences:
-
-[source,cpp]
 ----
-template<typename T>
-struct generator
-{
-    struct promise_type
-    {
-        T current_value_;
-        std::exception_ptr exception_;
+1
+2
+3
+4
+5
+----
 
-        generator get_return_object()
-        {
-            return generator{
-                std::coroutine_handle<promise_type>::from_promise(*this)
-            };
-        }
+=== Key Design Decisions
 
-        std::suspend_always initial_suspend() { return {}; }
-        std::suspend_always final_suspend() noexcept { return {}; }
-        void return_void() {}
+`initial_suspend()` returns `suspend_always`::
+The coroutine suspends before running any user code. This means the first call to `next()` starts the coroutine running.
 
-        std::suspend_always yield_value(T value)
-        {
-            current_value_ = std::move(value);
-            return {};
-        }
+`final_suspend()` returns `suspend_always`::
+The coroutine frame persists after completion. This is necessary because the iterator needs to check `handle.done()` after the last value.
 
-        void unhandled_exception()
-        {
-            exception_ = std::current_exception();
-        }
-    };
+Generator owns the handle::
+The destructor calls `handle.destroy()` to free the coroutine frame. Copying is disabled to avoid double-free; moving transfers ownership.
 
-    std::coroutine_handle<promise_type> handle_;
+`yield_value` stores and suspends::
+Stores the yielded value in `current_value` and returns `suspend_always` to pause the coroutine after each yield.
 
-    // Iterator interface for range-for
-    struct iterator { /* ... */ };
-    iterator begin();
-    iterator end();
-};
-----
+=== Fibonacci Generator
 
-Usage:
+Here is a more interesting generator that produces the Fibonacci sequence:
 
 [source,cpp]
 ----
-generator<int> iota(int start)
+Generator fibonacci()
 {
+    int a = 0, b = 1;
     while (true)
-        co_yield start++;
+    {
+        co_yield a;
+        int next = a + b;
+        a = b;
+        b = next;
+    }
 }
 
-for (int x : iota(0))
+int main()
 {
-    if (x > 10) break;
-    std::cout << x << "\n";
+    auto fib = fibonacci();
+    
+    for (int i = 0; i < 10 && fib.next(); ++i)
+    {
+        std::cout << fib.value() << " ";
+    }
+    std::cout << std::endl;
 }
 ----
 
-== Summary
-
-[cols="1,3"]
-|===
-| Component | Role
-
-| Promise type
-| Controls coroutine behavior, stores results
+*Output:*
 
-| `get_return_object()`
-| Creates the caller-visible return value
-
-| `initial_suspend()`
-| Eager vs lazy execution
-
-| `final_suspend()`
-| Cleanup and continuation
-
-| `coroutine_handle<P>`
-| Lightweight reference to coroutine frame
-
-| `await_transform()`
-| Intercept and transform `co_await` expressions
-|===
+----
+0 1 1 2 3 5 8 13 21 34 
+----
 
-== Next Steps
+The Fibonacci generator runs an infinite loop internally. It will produce values forever. But because it yields and suspends after each value, the caller controls when (and whether) to ask for more values. The generator only computes values on demand.
 
-The basics are in place. Now learn the advanced techniques:
+The variables `a` and `b` persist across yields because they live in the coroutine frame on the heap.
 
-* xref:advanced.adoc[Part IV: Advanced Topics] — Symmetric transfer, allocation, exceptions
+You have now learned how promise types and coroutine handles work together to create useful abstractions like generators. In the next section, you will explore advanced topics: symmetric transfer, allocation, and exception handling.
diff --git a/doc/modules/ROOT/pages/cpp20-coroutines/syntax.adoc b/doc/modules/ROOT/pages/cpp20-coroutines/syntax.adoc
index 66830923..fd73a3ca 100644
--- a/doc/modules/ROOT/pages/cpp20-coroutines/syntax.adoc
+++ b/doc/modules/ROOT/pages/cpp20-coroutines/syntax.adoc
@@ -1,409 +1,202 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
-
 = Part II: C++20 Syntax
 
-This section covers the three keywords that make a function into a coroutine
-and explains how awaitables control suspension.
-
-== The Three Keywords
-
-C++20 introduces three keywords that transform a function into a coroutine:
+This section introduces the three C++20 keywords that create coroutines and walks you through building your first coroutine step by step.
 
-[cols="1,3"]
-|===
-| Keyword | Purpose
+== Prerequisites
 
-| `co_await`
-| Suspend and wait for an operation to complete
+* Completed xref:foundations.adoc[Part I: Foundations]
+* Understanding of why coroutines exist and what problem they solve
 
-| `co_yield`
-| Produce a value and suspend
-
-| `co_return`
-| Complete the coroutine with a final value
-|===
+== The Three Keywords
 
-The mere presence of any of these keywords causes the compiler to treat the
-function as a coroutine, allocating a coroutine frame and generating the
-suspension machinery.
+A function becomes a coroutine when its body contains any of three special keywords: `co_await`, `co_yield`, or `co_return`. The presence of any of these keywords signals to the compiler that the function requires coroutine machinery.
 
-=== co_await — Suspend and Wait
+=== co_await
 
-The `co_await` expression suspends the current coroutine until the awaited
-operation completes:
+The `co_await` keyword suspends the coroutine and waits for some operation to complete. When you write `co_await expr`, the coroutine saves its state, pauses execution, and potentially allows other code to run. When the awaited operation completes, the coroutine resumes from exactly where it left off.
 
 [source,cpp]
 ----
-task<data> fetch()
+task<std::string> fetch_page(std::string url)
 {
-    auto response = co_await http_get(url);  // Suspend here
-    // Execution resumes when http_get completes
-    co_return parse(response);
+    auto response = co_await http_get(url);  // suspends until HTTP completes
+    return response.body;                     // continues after resumption
 }
 ----
 
-When `co_await` executes:
-
-1. The awaited object is queried (can we skip suspension?)
-2. If suspension is needed, coroutine state is saved
-3. Control returns to whoever resumed this coroutine
-4. Later, when the operation completes, the coroutine resumes
-5. The result of `co_await` is the operation's result
-
-=== co_yield — Produce a Value
+=== co_yield
 
-The `co_yield` expression produces a value to the caller and suspends:
+The `co_yield` keyword produces a value and suspends the coroutine. This pattern creates *generators*—functions that produce sequences of values one at a time. After yielding a value, the coroutine pauses until someone asks for the next value.
 
 [source,cpp]
 ----
-generator<int> range(int start, int end)
+generator<int> count_to(int n)
 {
-    for (int i = start; i < end; ++i)
-        co_yield i;  // Produce i, suspend
+    for (int i = 1; i <= n; ++i)
+    {
+        co_yield i;  // produce value, suspend, resume when next value requested
+    }
 }
 ----
 
-`co_yield x` is shorthand for `co_await promise.yield_value(x)`. The
-promise (which we'll cover later) decides what to do with the yielded value.
+=== co_return
 
-=== co_return — Complete the Coroutine
-
-The `co_return` statement completes the coroutine with an optional final value:
+The `co_return` keyword completes the coroutine and optionally provides a final result. Unlike a regular `return` statement, `co_return` interacts with the coroutine machinery to properly finalize the coroutine's state.
 
 [source,cpp]
 ----
 task<int> compute()
 {
     int result = 42;
-    co_return result;  // Complete with value 42
-}
-
-task<void> work()
-{
-    do_something();
-    co_return;  // Complete (void coroutine)
+    co_return result;  // completes the coroutine with value 42
 }
 ----
 
-After `co_return`:
-
-1. The promise stores the returned value (if any)
-2. The coroutine enters its final suspension point
-3. The coroutine cannot be resumed again
+For coroutines that do not return a value, use `co_return;` without an argument.
 
 == Your First Coroutine
 
-Let's build a complete, minimal coroutine. We'll need a return type that
-tells the compiler how to manage the coroutine.
+The distinction between regular functions and coroutines matters because they behave fundamentally differently at runtime:
+
+* A regular function allocates its local variables on the stack. When it returns, those variables are gone.
+* A coroutine allocates its local variables in a heap-allocated *coroutine frame*. When it suspends, those variables persist. When it resumes, they are still there.
+
+Here is the minimal structure needed to create a coroutine:
 
 [source,cpp]
 ----
 #include <coroutine>
-#include <iostream>
 
-struct MinimalTask
+struct SimpleCoroutine
 {
     struct promise_type
     {
-        MinimalTask get_return_object()
-        {
-            return MinimalTask{
-                std::coroutine_handle<promise_type>::from_promise(*this)
-            };
-        }
-
+        SimpleCoroutine get_return_object() { return {}; }
         std::suspend_never initial_suspend() { return {}; }
         std::suspend_never final_suspend() noexcept { return {}; }
         void return_void() {}
-        void unhandled_exception() { std::terminate(); }
+        void unhandled_exception() {}
     };
-
-    std::coroutine_handle<promise_type> handle_;
 };
 
-MinimalTask hello()
-{
-    std::cout << "Hello, ";
-    co_await std::suspend_never{};  // Makes this a coroutine
-    std::cout << "World!\n";
-}
-
-int main()
-{
-    hello();  // Prints "Hello, World!"
-}
-----
-
-The presence of `co_await` transforms `hello()` into a coroutine. The
-compiler looks for `MinimalTask::promise_type` to determine how to manage it.
-
-=== What the Compiler Transforms
-
-When you write a coroutine function, the compiler transforms it roughly like:
-
-[source,cpp]
-----
-// What you write:
-task<int> compute()
-{
-    int x = 1;
-    co_await something();
-    co_return x + 1;
-}
-
-// What the compiler generates (conceptually):
-task<int> compute()
+SimpleCoroutine my_first_coroutine()
 {
-    // Allocate coroutine frame
-    auto* frame = new __compute_frame();
-    auto& promise = frame->promise;
-
-    // Create return object immediately
-    auto return_object = promise.get_return_object();
-
-    try {
-        co_await promise.initial_suspend();
-
-        // Your code with suspension points
-        frame->x = 1;
-        co_await something();
-        promise.return_value(frame->x + 1);
-
-    } catch (...) {
-        promise.unhandled_exception();
-    }
-
-    co_await promise.final_suspend();
-    // Frame may be destroyed here or by caller
-    return return_object;
+    co_return;  // This makes it a coroutine
 }
 ----
 
-The frame persists across suspensions, holding local variables (`x`) and
-the promise object.
-
-=== The Coroutine Frame
+The `promise_type` nested structure provides the minimum scaffolding the compiler needs. You will learn what each method does in xref:machinery.adoc[Part III: Coroutine Machinery].
 
-The coroutine frame is heap-allocated (by default) and contains:
-
-* Local variables that span suspension points
-* The promise object
-* Bookkeeping for the current suspension point
-* Parameters (copied or moved into the frame)
-
-----
-+------------------------+
-| __compute_frame        |
-+------------------------+
-| int x;                 |  ← local variables
-| promise_type promise;  |  ← promise object
-| int __suspend_point;   |  ← which co_await are we at?
-+------------------------+
-----
-
-Heap Allocation eLision Optimization (HALO) can sometimes place the frame in
-the caller's frame, avoiding the allocation entirely.
+For now, observe that the presence of `co_return` transforms what looks like a regular function into a coroutine. If you try to compile a function with coroutine keywords but without proper infrastructure, the compiler will produce errors.
 
 == Awaitables and Awaiters
 
-When you write `co_await expr`, the compiler needs to know:
-
-1. Should we actually suspend?
-2. What happens when we suspend?
-3. What value do we produce when we resume?
-
-These questions are answered by the _awaiter_ object.
-
-=== The Awaitable Concept
-
-An awaitable is anything that can be used with `co_await`. The compiler
-converts it to an awaiter using these rules (in order):
-
-1. If the promise has `await_transform`, call it: `promise.await_transform(expr)`
-2. Otherwise, if `expr` has `operator co_await`, call it
-3. Otherwise, use `expr` directly as the awaiter
+When you write `co_await expr`, the expression `expr` must be an *awaitable*—something that knows how to suspend and resume a coroutine. The awaitable produces an *awaiter* object that implements three methods:
 
-=== The Awaiter Interface
+* `await_ready()` — Returns `true` if the result is immediately available and no suspension is needed
+* `await_suspend(handle)` — Called when the coroutine suspends; receives a handle to the coroutine for later resumption
+* `await_resume()` — Called when the coroutine resumes; its return value becomes the value of the `co_await` expression
 
-An awaiter must provide three methods:
+=== Example: Understanding the Awaiter Protocol
 
 [source,cpp]
 ----
-struct Awaiter
+#include <coroutine>
+#include <iostream>
+
+struct ReturnObject
 {
-    bool await_ready();                              // Skip suspension?
-    ??? await_suspend(std::coroutine_handle<> h);   // Do the suspension
-    T await_resume();                                // Get the result
+    struct promise_type
+    {
+        ReturnObject get_return_object() { return {}; }
+        std::suspend_never initial_suspend() { return {}; }
+        std::suspend_never final_suspend() noexcept { return {}; }
+        void return_void() {}
+        void unhandled_exception() {}
+    };
 };
-----
 
-==== await_ready()
-
-Returns `true` if the result is already available and suspension should be
-skipped. This is an optimization—if the operation completed synchronously,
-we can avoid the suspension overhead.
-
-[source,cpp]
-----
-bool await_ready()
+struct Awaiter
 {
-    return result_already_cached_;
-}
-----
-
-==== await_suspend(handle)
-
-Called when we actually suspend. Receives the coroutine handle, which can be
-stored and used later to resume the coroutine.
-
-The return type affects behavior:
-
-[cols="1,3"]
-|===
-| Return type | Behavior
-
-| `void`
-| Always suspend (caller continues)
-
-| `bool`
-| `true` = suspend, `false` = don't suspend (resume immediately)
-
-| `coroutine_handle<>`
-| Suspend and immediately resume the returned handle (symmetric transfer)
-|===
+    std::coroutine_handle<>* handle_out;
+    
+    bool await_ready() { return false; }  // always suspend
+    
+    void await_suspend(std::coroutine_handle<> h)
+    {
+        *handle_out = h;  // store handle for later resumption
+    }
+    
+    void await_resume() {}  // nothing to return
+};
 
-[source,cpp]
-----
-void await_suspend(std::coroutine_handle<> h)
+ReturnObject counter(std::coroutine_handle<>* handle)
 {
-    // Store handle, start async operation
-    async_start([h]() {
-        h.resume();  // Resume when done
-    });
+    Awaiter awaiter{handle};
+    
+    for (unsigned i = 0; ; ++i)
+    {
+        std::cout << "counter: " << i << std::endl;
+        co_await awaiter;
+    }
 }
 
-// Or: symmetric transfer to another coroutine
-std::coroutine_handle<> await_suspend(std::coroutine_handle<> h)
+int main()
 {
-    store_continuation(h);
-    return next_coroutine_to_run();
+    std::coroutine_handle<> h;
+    counter(&h);
+    
+    for (int i = 0; i < 3; ++i)
+    {
+        std::cout << "main: resuming" << std::endl;
+        h();
+    }
+    
+    h.destroy();
 }
 ----
 
-==== await_resume()
+*Output:*
 
-Called when the coroutine resumes. Returns the result of the `co_await`
-expression.
-
-[source,cpp]
 ----
-int await_resume()
-{
-    if (error_)
-        throw std::runtime_error("operation failed");
-    return result_;
-}
+counter: 0
+main: resuming
+counter: 1
+main: resuming
+counter: 2
+main: resuming
+counter: 3
 ----
 
-=== Built-in Awaiters
+Study this execution flow:
 
-The standard provides two simple awaiters:
+1. `main` calls `counter`, passing the address of a coroutine handle
+2. `counter` begins executing, prints "counter: 0", and reaches `co_await awaiter`
+3. `await_ready()` returns `false`, so suspension proceeds
+4. `await_suspend` receives a handle to the suspended coroutine and stores it in `main`'s variable `h`
+5. Control returns to `main`, which now holds a handle to the suspended coroutine
+6. `main` calls `h()`, which resumes the coroutine
+7. The coroutine continues from where it left off, increments `i`, prints "counter: 1", and suspends again
+8. This cycle repeats until `main` destroys the coroutine
 
-[source,cpp]
-----
-struct std::suspend_always
-{
-    bool await_ready() { return false; }  // Always suspend
-    void await_suspend(std::coroutine_handle<>) {}
-    void await_resume() {}
-};
+The variable `i` inside `counter` maintains its value across all these suspension and resumption cycles.
 
-struct std::suspend_never
-{
-    bool await_ready() { return true; }   // Never suspend
-    void await_suspend(std::coroutine_handle<>) {}
-    void await_resume() {}
-};
-----
+=== Standard Awaiters
 
-These are building blocks for promise types (e.g., `initial_suspend()`).
+The C++ standard library provides two predefined awaiters:
 
-=== Awaiter Example: Simple Timer
+* `std::suspend_always` — `await_ready()` returns `false` (always suspend)
+* `std::suspend_never` — `await_ready()` returns `true` (never suspend)
 
-Here's a complete awaiter that suspends for a duration:
+These are useful building blocks for promise types and custom awaitables.
 
 [source,cpp]
 ----
-#include <chrono>
-#include <thread>
-
-struct sleep_for
-{
-    std::chrono::milliseconds duration_;
-
-    bool await_ready() const
-    {
-        return duration_.count() <= 0;  // No sleep needed
-    }
-
-    void await_suspend(std::coroutine_handle<> h) const
-    {
-        // In real code, use a proper async timer
-        std::thread([=]() {
-            std::this_thread::sleep_for(duration_);
-            h.resume();
-        }).detach();
-    }
+// suspend_always causes suspension at this point
+co_await std::suspend_always{};
 
-    void await_resume() const {}
-};
-
-// Usage:
-task<void> delayed_work()
-{
-    std::cout << "Starting...\n";
-    co_await sleep_for{std::chrono::seconds(1)};
-    std::cout << "One second later\n";
-}
+// suspend_never continues immediately without suspending
+co_await std::suspend_never{};
 ----
 
-WARNING: This example detaches a thread, which is rarely appropriate in
-production code. Real async timers integrate with event loops.
-
-== Summary
-
-[cols="1,3"]
-|===
-| Element | Purpose
-
-| `co_await`
-| Suspend until operation completes
-
-| `co_yield`
-| Produce value and suspend
-
-| `co_return`
-| Complete coroutine with final value
-
-| `await_ready()`
-| Can we skip suspension?
-
-| `await_suspend(h)`
-| Perform suspension, maybe store handle
-
-| `await_resume()`
-| Return result when resumed
-|===
-
-== Next Steps
-
-You've learned the syntax. Now understand the machinery that makes it work:
-
-* xref:machinery.adoc[Part III: Coroutine Machinery] — Promise types and handles
+You have now learned the three coroutine keywords and how awaitables work. In the next section, you will learn about the promise type and coroutine handle—the machinery that makes coroutines function.
diff --git a/doc/modules/ROOT/pages/examples/buffer-composition.adoc b/doc/modules/ROOT/pages/examples/buffer-composition.adoc
new file mode 100644
index 00000000..d93afd82
--- /dev/null
+++ b/doc/modules/ROOT/pages/examples/buffer-composition.adoc
@@ -0,0 +1,171 @@
+= Buffer Composition
+
+Composing buffer sequences without allocation for scatter/gather I/O.
+
+== What You Will Learn
+
+* Creating buffers from different sources
+* Composing buffer sequences with `cat()`
+* Zero-allocation scatter/gather patterns
+
+== Prerequisites
+
+* Completed xref:producer-consumer.adoc[Producer-Consumer]
+* Understanding of buffer types from xref:../buffers/types.adoc[Buffer Types]
+
+== Source Code
+
+[source,cpp]
+----
+#include <boost/capy.hpp>
+#include <iostream>
+#include <string>
+#include <array>
+
+using namespace boost::capy;
+
+void demonstrate_buffers()
+{
+    // Individual buffers from various sources
+    std::string header = "Content-Type: text/plain\r\n\r\n";
+    std::string body = "Hello, World!";
+    char footer[] = "\r\n--END--";
+    
+    // Create buffer views (no copies)
+    auto header_buf = make_buffer(header);
+    auto body_buf = make_buffer(body);
+    auto footer_buf = make_buffer(footer, sizeof(footer) - 1);
+    
+    // Compose into a single sequence (no allocation!)
+    auto message = cat(header_buf, body_buf, footer_buf);
+    
+    // Measure
+    std::cout << "Total message size: " << buffer_size(message) << " bytes\n";
+    std::cout << "Buffer count: " << buffer_length(message) << "\n";
+    
+    // Iterate (for demonstration)
+    std::cout << "\nBuffer contents:\n";
+    for (auto it = begin(message); it != end(message); ++it)
+    {
+        const_buffer buf = *it;
+        std::cout << "  [" << buf.size() << " bytes]: ";
+        std::cout.write(static_cast<char const*>(buf.data()), buf.size());
+        std::cout << "\n";
+    }
+}
+
+// HTTP-style message assembly
+struct http_message
+{
+    std::string status_line = "HTTP/1.1 200 OK\r\n";
+    std::array<std::pair<std::string, std::string>, 2> headers = {{
+        {"Content-Type", "application/json"},
+        {"Server", "Capy/1.0"}
+    }};
+    std::string body = R"({"status":"ok"})";
+    
+    // Returns a composed buffer sequence
+    auto buffers() const
+    {
+        // Format headers
+        static constexpr char crlf[] = "\r\n";
+        static constexpr char sep[] = ": ";
+        static constexpr char empty_line[] = "\r\n";
+        
+        return cat(
+            make_buffer(status_line),
+            make_buffer(headers[0].first), make_buffer(sep, 2),
+            make_buffer(headers[0].second), make_buffer(crlf, 2),
+            make_buffer(headers[1].first), make_buffer(sep, 2),
+            make_buffer(headers[1].second), make_buffer(crlf, 2),
+            make_buffer(empty_line, 2),
+            make_buffer(body)
+        );
+    }
+};
+
+int main()
+{
+    demonstrate_buffers();
+    
+    std::cout << "\n--- HTTP Message ---\n";
+    http_message msg;
+    auto bufs = msg.buffers();
+    std::cout << "Message size: " << buffer_size(bufs) << " bytes\n";
+    
+    // In real code: co_await write(stream, msg.buffers());
+    // Single system call writes all buffers (scatter/gather)
+    
+    return 0;
+}
+----
+
+== Build
+
+[source,cmake]
+----
+add_executable(buffer_composition buffer_composition.cpp)
+target_link_libraries(buffer_composition PRIVATE capy)
+----
+
+== Walkthrough
+
+=== Creating Buffers
+
+[source,cpp]
+----
+auto header_buf = make_buffer(header);
+auto body_buf = make_buffer(body);
+auto footer_buf = make_buffer(footer, sizeof(footer) - 1);
+----
+
+`make_buffer` creates buffer views from various sources. No data is copied—the buffers reference the original storage.
+
+=== Zero-Allocation Composition
+
+[source,cpp]
+----
+auto message = cat(header_buf, body_buf, footer_buf);
+----
+
+`cat()` composes buffer sequences without allocation. The returned object stores references and iterates through all buffers in sequence.
+
+=== Scatter/Gather I/O
+
+[source,cpp]
+----
+co_await write(stream, msg.buffers());
+----
+
+When you write a composed buffer sequence, the OS receives all buffers in a single system call. This is *scatter/gather I/O*:
+
+* No intermediate buffer allocation
+* No copying data together
+* Single syscall for multiple buffers
+
+== Output
+
+----
+Total message size: 55 bytes
+Buffer count: 3
+
+Buffer contents:
+  [27 bytes]: Content-Type: text/plain
+
+  [13 bytes]: Hello, World!
+  [9 bytes]: 
+--END--
+
+--- HTTP Message ---
+Message size: 87 bytes
+----
+
+== Exercises
+
+1. Create a function that takes any `ConstBufferSequence` and prints its contents
+2. Measure the difference between copying data into a single buffer vs. using `cat()`
+3. Implement a simple message framing protocol using buffer composition
+
+== Next Steps
+
+* xref:mock-stream-testing.adoc[Mock Stream Testing] — Unit testing with mock streams
diff --git a/doc/modules/ROOT/pages/examples/custom-dynamic-buffer.adoc b/doc/modules/ROOT/pages/examples/custom-dynamic-buffer.adoc
new file mode 100644
index 00000000..cffe1c83
--- /dev/null
+++ b/doc/modules/ROOT/pages/examples/custom-dynamic-buffer.adoc
@@ -0,0 +1,298 @@
+= Custom Dynamic Buffer
+
+Implementing the DynamicBuffer concept for a custom allocation strategy.
+
+== What You Will Learn
+
+* Implementing the DynamicBuffer concept
+* Understanding `prepare`, `commit`, `consume` lifecycle
+* Custom memory management for I/O
+
+== Prerequisites
+
+* Completed xref:parallel-fetch.adoc[Parallel Fetch]
+* Understanding of dynamic buffers from xref:../buffers/dynamic.adoc[Dynamic Buffers]
+
+== Source Code
+
+[source,cpp]
+----
+#include <boost/capy.hpp>
+#include <boost/capy/test/run_blocking.hpp>
+#include <boost/capy/test/stream.hpp>
+#include <iostream>
+#include <vector>
+#include <cassert>
+
+using namespace boost::capy;
+
+// Custom dynamic buffer with statistics tracking
+class tracked_buffer
+{
+    std::vector<char> storage_;
+    std::size_t read_pos_ = 0;   // Start of readable data
+    std::size_t write_pos_ = 0;  // End of readable data
+    std::size_t max_size_;
+    
+    // Statistics
+    std::size_t total_prepared_ = 0;
+    std::size_t total_committed_ = 0;
+    std::size_t total_consumed_ = 0;
+    
+public:
+    explicit tracked_buffer(std::size_t max_size = 65536)
+        : max_size_(max_size)
+    {
+        storage_.reserve(1024);
+    }
+    
+    // === DynamicBuffer interface ===
+    
+    // Consumer: readable data
+    const_buffer data() const noexcept
+    {
+        return const_buffer(
+            storage_.data() + read_pos_,
+            write_pos_ - read_pos_);
+    }
+    
+    // Capacity queries
+    std::size_t size() const noexcept
+    {
+        return write_pos_ - read_pos_;
+    }
+    
+    std::size_t max_size() const noexcept
+    {
+        return max_size_;
+    }
+    
+    std::size_t capacity() const noexcept
+    {
+        return storage_.capacity() - read_pos_;
+    }
+    
+    // Producer: prepare space for writing
+    mutable_buffer prepare(std::size_t n)
+    {
+        total_prepared_ += n;
+        
+        // Compact if needed
+        if (storage_.size() + n > storage_.capacity() && read_pos_ > 0)
+        {
+            compact();
+        }
+        
+        // Grow if needed
+        std::size_t required = write_pos_ + n;
+        if (required > max_size_)
+            throw std::length_error("tracked_buffer: max_size exceeded");
+        
+        if (required > storage_.size())
+            storage_.resize(required);
+        
+        return mutable_buffer(
+            storage_.data() + write_pos_,
+            n);
+    }
+    
+    // Producer: mark bytes as written
+    void commit(std::size_t n)
+    {
+        total_committed_ += n;
+        write_pos_ += n;
+    }
+    
+    // Consumer: mark bytes as processed
+    void consume(std::size_t n)
+    {
+        total_consumed_ += n;
+        read_pos_ += n;
+        
+        if (read_pos_ == write_pos_)
+        {
+            // Buffer empty, reset positions
+            read_pos_ = 0;
+            write_pos_ = 0;
+        }
+    }
+    
+    // === Statistics ===
+    
+    void print_stats() const
+    {
+        std::cout << "Buffer statistics:\n"
+                  << "  Total prepared:  " << total_prepared_ << " bytes\n"
+                  << "  Total committed: " << total_committed_ << " bytes\n"
+                  << "  Total consumed:  " << total_consumed_ << " bytes\n"
+                  << "  Current size:    " << size() << " bytes\n"
+                  << "  Capacity:        " << capacity() << " bytes\n";
+    }
+    
+private:
+    void compact()
+    {
+        if (read_pos_ == 0)
+            return;
+        
+        std::size_t len = write_pos_ - read_pos_;
+        std::memmove(storage_.data(), storage_.data() + read_pos_, len);
+        read_pos_ = 0;
+        write_pos_ = len;
+    }
+};
+
+// Demonstrate using the custom buffer
+task<> read_into_tracked_buffer(any_stream& stream, tracked_buffer& buffer)
+{
+    // Read data in chunks
+    while (true)
+    {
+        auto space = buffer.prepare(256);
+        auto [ec, n] = co_await stream.read_some(space);
+        
+        if (ec == cond::eof)
+            break;
+        
+        if (ec.failed())
+            throw std::system_error(ec);
+        
+        buffer.commit(n);
+        
+        std::cout << "Read " << n << " bytes, buffer size now: " 
+                  << buffer.size() << "\n";
+    }
+}
+
+void demo_tracked_buffer()
+{
+    std::cout << "=== Tracked Buffer Demo ===\n\n";
+    
+    // Setup mock stream with test data
+    test::stream mock;
+    mock.provide("Hello, ");
+    mock.provide("World! ");
+    mock.provide("This is a test of the custom buffer.\n");
+    mock.provide_eof();
+    
+    any_stream stream{mock};
+    tracked_buffer buffer;
+    
+    test::run_blocking(read_into_tracked_buffer(stream, buffer));
+    
+    std::cout << "\nFinal buffer contents: ";
+    auto data = buffer.data();
+    std::cout.write(static_cast<char const*>(data.data()), data.size());
+    std::cout << "\n\n";
+    
+    buffer.print_stats();
+    
+    // Consume some data
+    std::cout << "\nConsuming 7 bytes...\n";
+    buffer.consume(7);
+    buffer.print_stats();
+}
+
+int main()
+{
+    demo_tracked_buffer();
+    return 0;
+}
+----
+
+== Build
+
+[source,cmake]
+----
+add_executable(custom_dynamic_buffer custom_dynamic_buffer.cpp)
+target_link_libraries(custom_dynamic_buffer PRIVATE capy)
+----
+
+== Walkthrough
+
+=== DynamicBuffer Requirements
+
+A DynamicBuffer must provide:
+
+[source,cpp]
+----
+// Consumer interface
+const_buffer data() const;      // Readable data
+void consume(std::size_t n);    // Mark bytes as processed
+
+// Producer interface
+mutable_buffer prepare(std::size_t n);  // Space for writing
+void commit(std::size_t n);             // Mark bytes as written
+
+// Capacity queries
+std::size_t size() const;       // Readable bytes
+std::size_t max_size() const;   // Maximum allowed
+std::size_t capacity() const;   // Currently allocated
+----
+
+=== The Producer/Consumer Flow
+
+[source,cpp]
+----
+// 1. Producer prepares space
+auto space = buffer.prepare(256);
+
+// 2. Data is written into space
+auto [ec, n] = co_await stream.read_some(space);
+
+// 3. Producer commits written bytes
+buffer.commit(n);
+
+// 4. Consumer reads data
+auto data = buffer.data();
+process(data);
+
+// 5. Consumer marks bytes as processed
+buffer.consume(processed_bytes);
+----
+
+=== Memory Management
+
+The `tracked_buffer` implementation:
+
+* Uses a single contiguous vector
+* Tracks read and write positions
+* Compacts when needed to reuse space
+* Grows on demand up to `max_size`
+
+== Output
+
+----
+=== Tracked Buffer Demo ===
+
+Read 7 bytes, buffer size now: 7
+Read 7 bytes, buffer size now: 14
+Read 37 bytes, buffer size now: 51
+
+Final buffer contents: Hello, World! This is a test of the custom buffer.
+
+Buffer statistics:
+  Total prepared:  768 bytes
+  Total committed: 51 bytes
+  Total consumed:  0 bytes
+  Current size:    51 bytes
+  Capacity:        256 bytes
+
+Consuming 7 bytes...
+Buffer statistics:
+  Total prepared:  768 bytes
+  Total committed: 51 bytes
+  Total consumed:  7 bytes
+  Current size:    44 bytes
+  Capacity:        249 bytes
+----
+
+== Exercises
+
+1. Add a "high water mark" statistic that tracks maximum buffer size reached
+2. Implement a ring buffer version that never moves data
+3. Add an allocator parameter for custom memory allocation
+
+== Next Steps
+
+* xref:echo-server-corosio.adoc[Echo Server with Corosio] — Real networking
diff --git a/doc/modules/ROOT/pages/examples/echo-server-corosio.adoc b/doc/modules/ROOT/pages/examples/echo-server-corosio.adoc
new file mode 100644
index 00000000..ca523123
--- /dev/null
+++ b/doc/modules/ROOT/pages/examples/echo-server-corosio.adoc
@@ -0,0 +1,242 @@
+= Echo Server with Corosio
+
+A complete echo server using Corosio for real network I/O.
+
+== What You Will Learn
+
+* Integrating Capy with Corosio networking
+* Accepting TCP connections
+* Handling multiple clients concurrently
+
+== Prerequisites
+
+* Completed xref:custom-dynamic-buffer.adoc[Custom Dynamic Buffer]
+* Corosio library installed
+* Understanding of TCP networking basics
+
+== Source Code
+
+[source,cpp]
+----
+#include <boost/capy.hpp>
+#include <boost/corosio.hpp>
+#include <iostream>
+
+using namespace boost::capy;
+namespace tcp = boost::corosio::tcp;
+
+// Echo handler: receives data and sends it back
+task<> echo_session(any_stream& stream, std::string client_info)
+{
+    std::cout << "[" << client_info << "] Session started\n";
+    
+    char buffer[1024];
+    std::size_t total_bytes = 0;
+    
+    try
+    {
+        for (;;)
+        {
+            // Read some data
+            auto [ec, n] = co_await stream.read_some(mutable_buffer(buffer));
+            
+            if (ec == cond::eof)
+            {
+                std::cout << "[" << client_info << "] Client disconnected\n";
+                break;
+            }
+            
+            if (ec.failed())
+            {
+                std::cout << "[" << client_info << "] Read error: " 
+                          << ec.message() << "\n";
+                break;
+            }
+            
+            total_bytes += n;
+            
+            // Echo it back
+            auto [wec, wn] = co_await write(stream, const_buffer(buffer, n));
+            
+            if (wec.failed())
+            {
+                std::cout << "[" << client_info << "] Write error: " 
+                          << wec.message() << "\n";
+                break;
+            }
+        }
+    }
+    catch (std::exception const& e)
+    {
+        std::cout << "[" << client_info << "] Exception: " << e.what() << "\n";
+    }
+    
+    std::cout << "[" << client_info << "] Session ended, "
+              << total_bytes << " bytes echoed\n";
+}
+
+// Accept loop: accepts connections and spawns handlers
+task<> accept_loop(tcp::acceptor& acceptor, executor_ref ex)
+{
+    std::cout << "Server listening on port " 
+              << acceptor.local_endpoint().port() << "\n";
+    
+    int connection_id = 0;
+    
+    for (;;)
+    {
+        // Accept a connection
+        auto [ec, socket] = co_await acceptor.async_accept();
+        
+        if (ec.failed())
+        {
+            std::cout << "Accept error: " << ec.message() << "\n";
+            continue;
+        }
+        
+        // Build client info string
+        auto remote = socket.remote_endpoint();
+        std::string client_info = 
+            std::to_string(++connection_id) + ":" +
+            remote.address().to_string() + ":" +
+            std::to_string(remote.port());
+        
+        std::cout << "[" << client_info << "] Connection accepted\n";
+        
+        // Wrap socket and spawn handler
+        // Note: socket ownership transfers to the lambda
+        run_async(ex)(
+            [](tcp::socket sock, std::string info) -> task<> {
+                any_stream stream{sock};
+                co_await echo_session(stream, std::move(info));
+            }(std::move(socket), std::move(client_info))
+        );
+    }
+}
+
+int main(int argc, char* argv[])
+{
+    try
+    {
+        // Parse port from command line
+        unsigned short port = 8080;
+        if (argc > 1)
+            port = static_cast<unsigned short>(std::stoi(argv[1]));
+        
+        // Create I/O context and thread pool
+        boost::corosio::io_context ioc;
+        thread_pool pool(4);
+        
+        // Create acceptor
+        tcp::endpoint endpoint(tcp::v4(), port);
+        tcp::acceptor acceptor(ioc, endpoint);
+        acceptor.set_option(tcp::acceptor::reuse_address(true));
+        
+        std::cout << "Starting echo server...\n";
+        
+        // Run accept loop
+        run_async(pool.get_executor())(
+            accept_loop(acceptor, pool.get_executor())
+        );
+        
+        // Run the I/O context (this blocks)
+        ioc.run();
+    }
+    catch (std::exception const& e)
+    {
+        std::cerr << "Error: " << e.what() << "\n";
+        return 1;
+    }
+    
+    return 0;
+}
+----
+
+== Build
+
+[source,cmake]
+----
+find_package(Corosio REQUIRED)
+
+add_executable(echo_server echo_server.cpp)
+target_link_libraries(echo_server PRIVATE capy Corosio::corosio)
+----
+
+== Walkthrough
+
+=== TCP Acceptor
+
+[source,cpp]
+----
+tcp::endpoint endpoint(tcp::v4(), port);
+tcp::acceptor acceptor(ioc, endpoint);
+----
+
+The acceptor listens for incoming connections on the specified port.
+
+=== Accept Loop
+
+[source,cpp]
+----
+for (;;)
+{
+    auto [ec, socket] = co_await acceptor.async_accept();
+    // ... handle connection ...
+}
+----
+
+The accept loop runs forever, accepting connections and spawning handlers. Each connection runs in its own task.
+
+=== Type Erasure
+
+[source,cpp]
+----
+any_stream stream{sock};
+co_await echo_session(stream, std::move(info));
+----
+
+The `echo_session` function accepts `any_stream&`. The concrete `tcp::socket` is wrapped at the call site. This keeps the echo logic transport-independent.
+
+=== Concurrent Clients
+
+Each client connection spawns a new task via `run_async`. Multiple clients are handled concurrently on the thread pool.
+
+== Testing
+
+Start the server:
+
+----
+$ ./echo_server 8080
+Starting echo server...
+Server listening on port 8080
+----
+
+Connect with netcat:
+
+----
+$ nc localhost 8080
+Hello
+Hello
+World
+World
+^C
+----
+
+Server output:
+
+----
+[1:127.0.0.1:54321] Connection accepted
+[1:127.0.0.1:54321] Session started
+[1:127.0.0.1:54321] Client disconnected
+[1:127.0.0.1:54321] Session ended, 12 bytes echoed
+----
+
+== Exercises
+
+1. Add a connection limit with graceful rejection
+2. Implement a simple command protocol (e.g., ECHO, QUIT, STATS)
+3. Add TLS support using Corosio's TLS streams
+
+== Next Steps
+
+* xref:stream-pipeline.adoc[Stream Pipeline] — Data transformation chains
diff --git a/doc/modules/ROOT/pages/examples/hello-task.adoc b/doc/modules/ROOT/pages/examples/hello-task.adoc
new file mode 100644
index 00000000..716cae8d
--- /dev/null
+++ b/doc/modules/ROOT/pages/examples/hello-task.adoc
@@ -0,0 +1,103 @@
+= Hello Task
+
+The minimal Capy program: a task that prints a message.
+
+== What You Will Learn
+
+* Creating a `task<>` coroutine
+* Using `thread_pool` as an execution context
+* Launching tasks with `run_async`
+
+== Prerequisites
+
+* C++20 compiler
+* Capy library installed
+
+== Source Code
+
+[source,cpp]
+----
+#include <boost/capy.hpp>
+#include <iostream>
+
+using namespace boost::capy;
+
+task<> say_hello()
+{
+    std::cout << "Hello from Capy!\n";
+    co_return;
+}
+
+int main()
+{
+    thread_pool pool;
+    run_async(pool.get_executor())(say_hello());
+    return 0;
+}
+----
+
+== Build
+
+[source,cmake]
+----
+add_executable(hello_task hello_task.cpp)
+target_link_libraries(hello_task PRIVATE capy)
+----
+
+== Walkthrough
+
+=== The Task
+
+[source,cpp]
+----
+task<> say_hello()
+{
+    std::cout << "Hello from Capy!\n";
+    co_return;
+}
+----
+
+`task<>` is equivalent to `task<void>`—a coroutine that completes without returning a value. The `co_return` keyword marks this as a coroutine.
+
+Tasks are lazy: calling `say_hello()` creates a task object but does not execute the body. The `"Hello"` message is not printed until the task is launched.
+
+=== The Thread Pool
+
+[source,cpp]
+----
+thread_pool pool;
+----
+
+`thread_pool` provides an execution context with worker threads. By default, it creates one thread per CPU core.
+
+The pool's destructor waits for all work to complete before returning. This ensures the program doesn't exit while tasks are running.
+
+=== Launching
+
+[source,cpp]
+----
+run_async(pool.get_executor())(say_hello());
+----
+
+`run_async` bridges non-coroutine code (like `main`) to coroutine code. The two-call syntax:
+
+1. `run_async(pool.get_executor())` — Creates a launcher with the executor
+2. `(say_hello())` — Accepts the task and starts execution
+
+The task runs on one of the pool's worker threads.
+
+== Output
+
+----
+Hello from Capy!
+----
+
+== Exercises
+
+1. Modify `say_hello` to accept a `std::string_view` parameter and print it
+2. Create multiple tasks and launch them all
+3. Add a handler to `run_async` that prints when the task completes
+
+== Next Steps
+
+* xref:producer-consumer.adoc[Producer-Consumer] — Multiple tasks communicating
diff --git a/doc/modules/ROOT/pages/examples/mock-stream-testing.adoc b/doc/modules/ROOT/pages/examples/mock-stream-testing.adoc
new file mode 100644
index 00000000..61f30911
--- /dev/null
+++ b/doc/modules/ROOT/pages/examples/mock-stream-testing.adoc
@@ -0,0 +1,204 @@
+= Mock Stream Testing
+
+Unit testing protocol code with mock streams and error injection.
+
+== What You Will Learn
+
+* Using `test::read_stream` and `test::write_stream`
+* Error injection with `fuse`
+* Synchronous testing with `run_blocking`
+
+== Prerequisites
+
+* Completed xref:buffer-composition.adoc[Buffer Composition]
+* Understanding of streams from xref:../streams/streams.adoc[Streams]
+
+== Source Code
+
+[source,cpp]
+----
+#include <boost/capy.hpp>
+#include <boost/capy/test/read_stream.hpp>
+#include <boost/capy/test/write_stream.hpp>
+#include <boost/capy/test/fuse.hpp>
+#include <boost/capy/test/run_blocking.hpp>
+#include <iostream>
+#include <cassert>
+
+using namespace boost::capy;
+
+// A simple protocol: read until newline, echo back uppercase
+task<bool> echo_line_uppercase(any_stream& stream)
+{
+    std::string line;
+    char c;
+    
+    // Read character by character until newline
+    while (true)
+    {
+        auto [ec, n] = co_await stream.read_some(mutable_buffer(&c, 1));
+        
+        if (ec.failed())
+            co_return false;
+        
+        if (c == '\n')
+            break;
+        
+        line += static_cast<char>(std::toupper(static_cast<unsigned char>(c)));
+    }
+    
+    line += '\n';
+    
+    // Echo uppercase
+    auto [ec, n] = co_await write(stream, make_buffer(line));
+    
+    co_return !ec.failed();
+}
+
+void test_happy_path()
+{
+    std::cout << "Test: happy path\n";
+    
+    test::stream mock;
+    mock.provide("hello\n");
+    
+    any_stream stream{mock};
+    
+    bool result = test::run_blocking(echo_line_uppercase(stream));
+    
+    assert(result == true);
+    assert(mock.output() == "HELLO\n");
+    
+    std::cout << "  PASSED\n";
+}
+
+void test_partial_reads()
+{
+    std::cout << "Test: partial reads (1 byte at a time)\n";
+    
+    // Mock returns at most 1 byte per read_some
+    test::stream mock(1);  // max_read_size = 1
+    mock.provide("hi\n");
+    
+    any_stream stream{mock};
+    
+    bool result = test::run_blocking(echo_line_uppercase(stream));
+    
+    assert(result == true);
+    assert(mock.output() == "HI\n");
+    
+    std::cout << "  PASSED\n";
+}
+
+void test_with_error_injection()
+{
+    std::cout << "Test: error injection\n";
+    
+    // fuse::armed runs the test repeatedly, failing at each
+    // operation point until all paths are covered
+    test::fuse::armed([](test::fuse& f) {
+        test::stream mock;
+        mock.provide("test\n");
+        
+        // Associate fuse with mock for error injection
+        mock.set_fuse(&f);
+        
+        any_stream stream{mock};
+        
+        // Run the protocol - fuse will inject errors at each step
+        auto result = test::run_blocking(echo_line_uppercase(stream));
+        
+        // Either succeeds with correct output, or fails cleanly
+        if (result)
+        {
+            f.expect(mock.output() == "TEST\n");
+        }
+    });
+    
+    std::cout << "  PASSED (all error paths tested)\n";
+}
+
+int main()
+{
+    test_happy_path();
+    test_partial_reads();
+    test_with_error_injection();
+    
+    std::cout << "\nAll tests passed!\n";
+    return 0;
+}
+----
+
+== Build
+
+[source,cmake]
+----
+add_executable(mock_stream_testing mock_stream_testing.cpp)
+target_link_libraries(mock_stream_testing PRIVATE capy)
+----
+
+== Walkthrough
+
+=== Mock Streams
+
+[source,cpp]
+----
+test::stream mock;
+mock.provide("hello\n");
+----
+
+`test::stream` is a bidirectional mock that satisfies both `ReadStream` and `WriteStream`:
+
+* `provide(data)` — Supplies data for reads
+* `output()` — Returns data written to the mock
+* Constructor parameter controls max bytes per operation
+
+=== Synchronous Testing
+
+[source,cpp]
+----
+bool result = test::run_blocking(echo_line_uppercase(stream));
+----
+
+`run_blocking` executes a coroutine synchronously, blocking until complete. This enables traditional unit test patterns with coroutines.
+
+=== Error Injection
+
+[source,cpp]
+----
+test::fuse::armed([](test::fuse& f) {
+    mock.set_fuse(&f);
+    // ... run test ...
+});
+----
+
+`fuse::armed` runs the test function repeatedly, injecting errors at each operation point:
+
+1. First run: error at operation 1
+2. Second run: error at operation 2
+3. ...and so on until all operations succeed
+
+This systematically tests all error handling paths.
+
+== Output
+
+----
+Test: happy path
+  PASSED
+Test: partial reads (1 byte at a time)
+  PASSED
+Test: error injection
+  PASSED (all error paths tested)
+
+All tests passed!
+----
+
+== Exercises
+
+1. Add a test for EOF handling (what if input doesn't end with newline?)
+2. Test with different max_read_size values
+3. Add a test for write errors using `test::write_stream`
+
+== Next Steps
+
+* xref:type-erased-echo.adoc[Type-Erased Echo] — Compilation firewall pattern
diff --git a/doc/modules/ROOT/pages/examples/parallel-fetch.adoc b/doc/modules/ROOT/pages/examples/parallel-fetch.adoc
new file mode 100644
index 00000000..38fef0ba
--- /dev/null
+++ b/doc/modules/ROOT/pages/examples/parallel-fetch.adoc
@@ -0,0 +1,248 @@
+= Parallel Fetch
+
+Running multiple operations concurrently with `when_all`.
+
+== What You Will Learn
+
+* Using `when_all` to run tasks in parallel
+* Structured bindings for results
+* Error propagation in concurrent tasks
+
+== Prerequisites
+
+* Completed xref:timeout-cancellation.adoc[Timeout with Cancellation]
+* Understanding of `when_all` from xref:../coroutines/composition.adoc[Composition]
+
+== Source Code
+
+[source,cpp]
+----
+#include <boost/capy.hpp>
+#include <iostream>
+#include <string>
+
+using namespace boost::capy;
+
+// Simulated async operations
+task<int> fetch_user_id(std::string username)
+{
+    std::cout << "Fetching user ID for: " << username << "\n";
+    // In real code: co_await http_get("/users/" + username);
+    co_return username.length() * 100;  // Fake ID
+}
+
+task<std::string> fetch_user_name(int id)
+{
+    std::cout << "Fetching name for user ID: " << id << "\n";
+    co_return "User" + std::to_string(id);
+}
+
+task<int> fetch_order_count(int user_id)
+{
+    std::cout << "Fetching order count for user: " << user_id << "\n";
+    co_return user_id / 10;  // Fake count
+}
+
+task<double> fetch_account_balance(int user_id)
+{
+    std::cout << "Fetching balance for user: " << user_id << "\n";
+    co_return user_id * 1.5;  // Fake balance
+}
+
+// Fetch all user data in parallel
+task<> fetch_user_dashboard(std::string username)
+{
+    std::cout << "\n=== Fetching dashboard for: " << username << " ===\n";
+    
+    // First, get the user ID (needed for other queries)
+    int user_id = co_await fetch_user_id(username);
+    std::cout << "Got user ID: " << user_id << "\n\n";
+    
+    // Now fetch all user data in parallel
+    std::cout << "Starting parallel fetches...\n";
+    auto [name, orders, balance] = co_await when_all(
+        fetch_user_name(user_id),
+        fetch_order_count(user_id),
+        fetch_account_balance(user_id)
+    );
+    
+    std::cout << "\nDashboard results:\n";
+    std::cout << "  Name: " << name << "\n";
+    std::cout << "  Orders: " << orders << "\n";
+    std::cout << "  Balance: $" << balance << "\n";
+}
+
+// Example with void tasks
+task<> log_access(std::string resource)
+{
+    std::cout << "Logging access to: " << resource << "\n";
+    co_return;
+}
+
+task<> update_metrics(std::string metric)
+{
+    std::cout << "Updating metric: " << metric << "\n";
+    co_return;
+}
+
+task<std::string> fetch_with_side_effects()
+{
+    std::cout << "\n=== Fetch with side effects ===\n";
+    
+    // void tasks don't contribute to result tuple
+    auto [data] = co_await when_all(
+        log_access("api/data"),           // void - no result
+        update_metrics("api_calls"),      // void - no result
+        fetch_user_name(42)               // returns string
+    );
+    
+    std::cout << "Data: " << data << "\n";
+    co_return data;
+}
+
+// Error handling example
+task<int> might_fail(bool should_fail, std::string name)
+{
+    std::cout << "Task " << name << " starting\n";
+    
+    if (should_fail)
+    {
+        throw std::runtime_error(name + " failed!");
+    }
+    
+    std::cout << "Task " << name << " completed\n";
+    co_return 42;
+}
+
+task<> demonstrate_error_handling()
+{
+    std::cout << "\n=== Error handling ===\n";
+    
+    try
+    {
+        auto [a, b, c] = co_await when_all(
+            might_fail(false, "A"),
+            might_fail(true, "B"),   // This one fails
+            might_fail(false, "C")
+        );
+        std::cout << "All succeeded: " << a << ", " << b << ", " << c << "\n";
+    }
+    catch (std::runtime_error const& e)
+    {
+        std::cout << "Caught error: " << e.what() << "\n";
+        // Note: when_all waits for all tasks to complete (or respond to stop)
+        // before propagating the first exception
+    }
+}
+
+int main()
+{
+    thread_pool pool;
+    
+    run_async(pool.get_executor())(fetch_user_dashboard("alice"));
+    run_async(pool.get_executor())(fetch_with_side_effects());
+    run_async(pool.get_executor())(demonstrate_error_handling());
+    
+    return 0;
+}
+----
+
+== Build
+
+[source,cmake]
+----
+add_executable(parallel_fetch parallel_fetch.cpp)
+target_link_libraries(parallel_fetch PRIVATE capy)
+----
+
+== Walkthrough
+
+=== Basic when_all
+
+[source,cpp]
+----
+auto [name, orders, balance] = co_await when_all(
+    fetch_user_name(user_id),
+    fetch_order_count(user_id),
+    fetch_account_balance(user_id)
+);
+----
+
+All three tasks run concurrently. `when_all` completes when all tasks finish. Results are returned in a tuple matching input order.
+
+=== Void Filtering
+
+[source,cpp]
+----
+auto [data] = co_await when_all(
+    log_access("api/data"),      // void - filtered out
+    update_metrics("api_calls"), // void - filtered out
+    fetch_user_name(42)          // string - in tuple
+);
+----
+
+Tasks returning `void` don't contribute to the result tuple. Only non-void results appear.
+
+=== Error Propagation
+
+[source,cpp]
+----
+try
+{
+    auto results = co_await when_all(task_a(), task_b(), task_c());
+}
+catch (...)
+{
+    // First exception is rethrown
+    // All tasks complete before exception propagates
+}
+----
+
+When a task throws:
+
+1. The exception is captured
+2. Stop is requested for siblings
+3. All tasks complete (or respond to stop)
+4. First exception is rethrown
+
+== Output
+
+----
+=== Fetching dashboard for: alice ===
+Fetching user ID for: alice
+Got user ID: 500
+
+Starting parallel fetches...
+Fetching name for user ID: 500
+Fetching order count for user: 500
+Fetching balance for user: 500
+
+Dashboard results:
+  Name: User500
+  Orders: 50
+  Balance: $750
+
+=== Fetch with side effects ===
+Logging access to: api/data
+Updating metric: api_calls
+Fetching name for user ID: 42
+Data: User42
+
+=== Error handling ===
+Task A starting
+Task B starting
+Task C starting
+Task A completed
+Task C completed
+Caught error: B failed!
+----
+
+== Exercises
+
+1. Add timing to see the parallel speedup vs sequential execution
+2. Implement a "fan-out/fan-in" pattern that processes a list of items in parallel
+3. Add cancellation support so remaining tasks can exit early on error
+
+== Next Steps
+
+* xref:custom-dynamic-buffer.adoc[Custom Dynamic Buffer] — Implementing your own buffer
diff --git a/doc/modules/ROOT/pages/examples/producer-consumer.adoc b/doc/modules/ROOT/pages/examples/producer-consumer.adoc
new file mode 100644
index 00000000..8b0dcfd4
--- /dev/null
+++ b/doc/modules/ROOT/pages/examples/producer-consumer.adoc
@@ -0,0 +1,145 @@
+= Producer-Consumer
+
+Two tasks communicating via an async event.
+
+== What You Will Learn
+
+* Using `async_event` for coroutine synchronization
+* Running multiple concurrent tasks with `when_all`
+* Task-to-task communication patterns
+
+== Prerequisites
+
+* Completed xref:hello-task.adoc[Hello Task]
+* Understanding of basic task creation and launching
+
+== Source Code
+
+[source,cpp]
+----
+#include <boost/capy.hpp>
+#include <iostream>
+
+using namespace boost::capy;
+
+async_event data_ready;
+int shared_value = 0;
+
+task<> producer()
+{
+    std::cout << "Producer: preparing data...\n";
+    
+    // Simulate work
+    shared_value = 42;
+    
+    std::cout << "Producer: data ready, signaling\n";
+    data_ready.set();
+    
+    co_return;
+}
+
+task<> consumer()
+{
+    std::cout << "Consumer: waiting for data...\n";
+    
+    co_await data_ready.wait();
+    
+    std::cout << "Consumer: received value " << shared_value << "\n";
+    
+    co_return;
+}
+
+task<> run_both()
+{
+    co_await when_all(producer(), consumer());
+}
+
+int main()
+{
+    thread_pool pool;
+    run_async(pool.get_executor())(run_both());
+    return 0;
+}
+----
+
+== Build
+
+[source,cmake]
+----
+add_executable(producer_consumer producer_consumer.cpp)
+target_link_libraries(producer_consumer PRIVATE capy)
+----
+
+== Walkthrough
+
+=== The Event
+
+[source,cpp]
+----
+async_event data_ready;
+----
+
+`async_event` is a one-shot signaling mechanism. One task can `set()` it; other tasks can `wait()` for it. When set, all waiting tasks resume.
+
+=== Producer
+
+[source,cpp]
+----
+task<> producer()
+{
+    shared_value = 42;
+    data_ready.set();
+    co_return;
+}
+----
+
+The producer prepares data and signals completion by calling `set()`.
+
+=== Consumer
+
+[source,cpp]
+----
+task<> consumer()
+{
+    co_await data_ready.wait();
+    std::cout << "Consumer: received value " << shared_value << "\n";
+    co_return;
+}
+----
+
+The consumer waits until the event is set. The `co_await data_ready.wait()` suspends until `set()` is called.
+
+=== Running Both
+
+[source,cpp]
+----
+task<> run_both()
+{
+    co_await when_all(producer(), consumer());
+}
+----
+
+`when_all` runs both tasks concurrently. It completes when both tasks have finished.
+
+The order of execution depends on scheduling, but synchronization ensures the consumer sees the producer's data.
+
+== Output
+
+----
+Producer: preparing data...
+Consumer: waiting for data...
+Producer: data ready, signaling
+Consumer: received value 42
+----
+
+(Output order may vary due to concurrent execution)
+
+== Exercises
+
+1. Add multiple consumers that all wait for the same event
+2. Create a producer that sets the event multiple times (use a loop with a new event each iteration)
+3. Add error handling—what happens if the producer throws?
+
+== Next Steps
+
+* xref:buffer-composition.adoc[Buffer Composition] — Zero-allocation buffer composition
diff --git a/doc/modules/ROOT/pages/examples/stream-pipeline.adoc b/doc/modules/ROOT/pages/examples/stream-pipeline.adoc
new file mode 100644
index 00000000..7aa2ba9c
--- /dev/null
+++ b/doc/modules/ROOT/pages/examples/stream-pipeline.adoc
@@ -0,0 +1,316 @@
+= Stream Pipeline
+
+Data transformation through a pipeline of sources and sinks.
+
+== What You Will Learn
+
+* Building processing pipelines
+* Using `BufferSource` and `BufferSink` concepts
+* Chaining transformations
+
+== Prerequisites
+
+* Completed xref:echo-server-corosio.adoc[Echo Server with Corosio]
+* Understanding of buffer sources/sinks from xref:../streams/buffer-concepts.adoc[Buffer Concepts]
+
+== Source Code
+
+[source,cpp]
+----
+#include <boost/capy.hpp>
+#include <boost/capy/test/run_blocking.hpp>
+#include <boost/capy/test/buffer_source.hpp>
+#include <boost/capy/test/buffer_sink.hpp>
+#include <iostream>
+#include <algorithm>
+#include <cctype>
+
+using namespace boost::capy;
+
+// A transform stage that converts to uppercase
+class uppercase_transform
+{
+    any_buffer_source* source_;
+    std::vector<char> buffer_;
+    std::size_t offset_ = 0;
+    bool exhausted_ = false;
+    
+public:
+    explicit uppercase_transform(any_buffer_source& source)
+        : source_(&source)
+    {
+    }
+    
+    // BufferSource interface
+    io_result<std::size_t> pull(const_buffer* arr, std::size_t max_count)
+    {
+        if (exhausted_ && offset_ >= buffer_.size())
+            co_return {error_code{}, 0};  // Exhausted
+        
+        // Need more data?
+        if (offset_ >= buffer_.size())
+        {
+            buffer_.clear();
+            offset_ = 0;
+            
+            // Pull from upstream
+            const_buffer upstream[8];
+            auto [ec, count] = co_await source_->pull(upstream, 8);
+            
+            if (ec.failed())
+                co_return {ec, 0};
+            
+            if (count == 0)
+            {
+                exhausted_ = true;
+                co_return {error_code{}, 0};
+            }
+            
+            // Transform: copy and uppercase
+            for (std::size_t i = 0; i < count; ++i)
+            {
+                auto data = static_cast<char const*>(upstream[i].data());
+                auto size = upstream[i].size();
+                
+                for (std::size_t j = 0; j < size; ++j)
+                {
+                    buffer_.push_back(static_cast<char>(
+                        std::toupper(static_cast<unsigned char>(data[j]))));
+                }
+            }
+        }
+        
+        // Return our buffer
+        arr[0] = const_buffer(buffer_.data() + offset_, buffer_.size() - offset_);
+        offset_ = buffer_.size();  // Mark as consumed
+        
+        co_return {error_code{}, 1};
+    }
+};
+
+// A transform that adds line numbers
+class line_numbering_transform
+{
+    any_buffer_source* source_;
+    std::string buffer_;
+    std::size_t line_num_ = 1;
+    bool exhausted_ = false;
+    bool at_line_start_ = true;
+    
+public:
+    explicit line_numbering_transform(any_buffer_source& source)
+        : source_(&source)
+    {
+    }
+    
+    io_result<std::size_t> pull(const_buffer* arr, std::size_t max_count)
+    {
+        if (exhausted_ && buffer_.empty())
+            co_return {error_code{}, 0};
+        
+        // Pull more data if needed
+        if (buffer_.empty())
+        {
+            const_buffer upstream[8];
+            auto [ec, count] = co_await source_->pull(upstream, 8);
+            
+            if (ec.failed())
+                co_return {ec, 0};
+            
+            if (count == 0)
+            {
+                exhausted_ = true;
+                co_return {error_code{}, 0};
+            }
+            
+            // Transform: add line numbers
+            for (std::size_t i = 0; i < count; ++i)
+            {
+                auto data = static_cast<char const*>(upstream[i].data());
+                auto size = upstream[i].size();
+                
+                for (std::size_t j = 0; j < size; ++j)
+                {
+                    if (at_line_start_)
+                    {
+                        buffer_ += std::to_string(line_num_++) + ": ";
+                        at_line_start_ = false;
+                    }
+                    
+                    buffer_ += data[j];
+                    
+                    if (data[j] == '\n')
+                        at_line_start_ = true;
+                }
+            }
+        }
+        
+        arr[0] = const_buffer(buffer_.data(), buffer_.size());
+        buffer_.clear();
+        
+        co_return {error_code{}, 1};
+    }
+};
+
+// Transfer from source to sink
+task<std::size_t> transfer(any_buffer_source& source, any_write_sink& sink)
+{
+    std::size_t total = 0;
+    const_buffer bufs[8];
+    
+    for (;;)
+    {
+        auto [ec, count] = co_await source.pull(bufs, 8);
+        
+        if (ec.failed())
+            throw std::system_error(ec);
+        
+        if (count == 0)
+            break;
+        
+        for (std::size_t i = 0; i < count; ++i)
+        {
+            auto [wec, n] = co_await sink.write(bufs[i]);
+            if (wec.failed())
+                throw std::system_error(wec);
+            total += n;
+        }
+    }
+    
+    co_await sink.write_eof();
+    co_return total;
+}
+
+void demo_pipeline()
+{
+    std::cout << "=== Stream Pipeline Demo ===\n\n";
+    
+    // Source data
+    std::string input = "hello world\nthis is a test\nof the pipeline\n";
+    std::cout << "Input:\n" << input << "\n";
+    
+    // Create source from string
+    test::buffer_source source;
+    source.provide(input);
+    source.provide_eof();
+    
+    // Wrap as any_buffer_source
+    any_buffer_source src{source};
+    
+    // Create transform stages
+    uppercase_transform upper{src};
+    any_buffer_source upper_src{upper};
+    
+    line_numbering_transform numbered{upper_src};
+    any_buffer_source numbered_src{numbered};
+    
+    // Create sink
+    test::write_sink sink;
+    any_write_sink dst{sink};
+    
+    // Run pipeline
+    auto bytes = test::run_blocking(transfer(numbered_src, dst));
+    
+    std::cout << "Output (" << bytes << " bytes):\n";
+    std::cout << sink.data() << "\n";
+}
+
+int main()
+{
+    demo_pipeline();
+    return 0;
+}
+----
+
+== Build
+
+[source,cmake]
+----
+add_executable(stream_pipeline stream_pipeline.cpp)
+target_link_libraries(stream_pipeline PRIVATE capy)
+----
+
+== Walkthrough
+
+=== Pipeline Structure
+
+----
+Source → Uppercase → LineNumbering → Sink
+----
+
+Data flows through the pipeline:
+
+1. Source provides raw input
+2. Uppercase transforms to uppercase
+3. LineNumbering adds line numbers
+4. Sink collects output
+
+=== BufferSource Implementation
+
+[source,cpp]
+----
+io_result<std::size_t> pull(const_buffer* arr, std::size_t max_count)
+{
+    // Pull from upstream
+    auto [ec, count] = co_await source_->pull(upstream, 8);
+    
+    // Transform data
+    // ...
+    
+    // Return transformed buffer
+    arr[0] = const_buffer(buffer_.data(), buffer_.size());
+    co_return {error_code{}, 1};
+}
+----
+
+Each stage pulls from upstream, transforms, and provides output buffers.
+
+=== Type Erasure
+
+[source,cpp]
+----
+any_buffer_source src{source};
+uppercase_transform upper{src};
+any_buffer_source upper_src{upper};
+----
+
+`any_buffer_source` wraps each stage, allowing uniform composition.
+
+== Output
+
+----
+=== Stream Pipeline Demo ===
+
+Input:
+hello world
+this is a test
+of the pipeline
+
+Output (63 bytes):
+1: HELLO WORLD
+2: THIS IS A TEST
+3: OF THE PIPELINE
+----
+
+== Exercises
+
+1. Add a compression/decompression stage
+2. Implement a ROT13 transform
+3. Create a filtering stage that drops lines matching a pattern
+
+== Summary
+
+This example catalog demonstrated:
+
+* Basic task creation and launching
+* Coroutine synchronization with events
+* Buffer composition for scatter/gather I/O
+* Unit testing with mock streams
+* Compilation firewalls with type erasure
+* Cooperative cancellation with stop tokens
+* Concurrent execution with `when_all`
+* Custom buffer implementations
+* Real network I/O with Corosio
+* Data transformation pipelines
+
+These patterns form the foundation for building robust, efficient I/O applications with Capy.
diff --git a/doc/modules/ROOT/pages/examples/timeout-cancellation.adoc b/doc/modules/ROOT/pages/examples/timeout-cancellation.adoc
new file mode 100644
index 00000000..b3875718
--- /dev/null
+++ b/doc/modules/ROOT/pages/examples/timeout-cancellation.adoc
@@ -0,0 +1,224 @@
+= Timeout with Cancellation
+
+Using stop tokens to implement operation timeouts.
+
+== What You Will Learn
+
+* Creating and using `std::stop_source`
+* Checking `stop_requested()` in coroutines
+* Cancellation patterns
+
+== Prerequisites
+
+* Completed xref:type-erased-echo.adoc[Type-Erased Echo]
+* Understanding of stop tokens from xref:../coroutines/cancellation.adoc[Cancellation]
+
+== Source Code
+
+[source,cpp]
+----
+#include <boost/capy.hpp>
+#include <boost/capy/test/stream.hpp>
+#include <boost/capy/test/run_blocking.hpp>
+#include <iostream>
+#include <chrono>
+#include <thread>
+
+using namespace boost::capy;
+
+// A slow operation that respects cancellation
+task<std::string> slow_fetch(int steps)
+{
+    auto token = co_await get_stop_token();
+    std::string result;
+    
+    for (int i = 0; i < steps; ++i)
+    {
+        // Check cancellation before each step
+        if (token.stop_requested())
+        {
+            std::cout << "  Cancelled at step " << i << "\n";
+            throw std::system_error(
+                make_error_code(std::errc::operation_canceled));
+        }
+        
+        result += "step" + std::to_string(i) + " ";
+        
+        // Simulate work (in real code, this would be I/O)
+        std::cout << "  Completed step " << i << "\n";
+    }
+    
+    co_return result;
+}
+
+// Run with timeout (conceptual - real implementation needs timer)
+task<std::optional<std::string>> fetch_with_timeout()
+{
+    auto token = co_await get_stop_token();
+    
+    try
+    {
+        auto result = co_await slow_fetch(5);
+        co_return result;
+    }
+    catch (std::system_error const& e)
+    {
+        if (e.code() == std::errc::operation_canceled)
+            co_return std::nullopt;
+        throw;
+    }
+}
+
+void demo_normal_completion()
+{
+    std::cout << "Demo: Normal completion\n";
+    
+    thread_pool pool;
+    std::stop_source source;
+    
+    run_async(pool.get_executor(), source.get_token(),
+        [](std::optional<std::string> result) {
+            if (result)
+                std::cout << "Result: " << *result << "\n";
+            else
+                std::cout << "Cancelled\n";
+        }
+    )(fetch_with_timeout());
+}
+
+void demo_cancellation()
+{
+    std::cout << "\nDemo: Cancellation after 2 steps\n";
+    
+    thread_pool pool;
+    std::stop_source source;
+    
+    // Launch the task
+    run_async(pool.get_executor(), source.get_token(),
+        [](std::optional<std::string> result) {
+            if (result)
+                std::cout << "Result: " << *result << "\n";
+            else
+                std::cout << "Cancelled (returned nullopt)\n";
+        }
+    )(fetch_with_timeout());
+    
+    // Simulate timeout: cancel after brief delay
+    std::this_thread::sleep_for(std::chrono::milliseconds(10));
+    std::cout << "  Requesting stop...\n";
+    source.request_stop();
+}
+
+// Example: Manual stop token checking
+task<int> process_items(std::vector<int> const& items)
+{
+    auto token = co_await get_stop_token();
+    int sum = 0;
+    
+    for (auto item : items)
+    {
+        if (token.stop_requested())
+        {
+            std::cout << "Processing cancelled, partial sum: " << sum << "\n";
+            co_return sum;  // Return partial result
+        }
+        
+        sum += item;
+    }
+    
+    co_return sum;
+}
+
+int main()
+{
+    demo_normal_completion();
+    demo_cancellation();
+    
+    return 0;
+}
+----
+
+== Build
+
+[source,cmake]
+----
+add_executable(timeout_cancellation timeout_cancellation.cpp)
+target_link_libraries(timeout_cancellation PRIVATE capy)
+----
+
+== Walkthrough
+
+=== Getting the Stop Token
+
+[source,cpp]
+----
+auto token = co_await get_stop_token();
+----
+
+Inside a task, `get_stop_token()` retrieves the stop token propagated from the caller.
+
+=== Checking for Cancellation
+
+[source,cpp]
+----
+if (token.stop_requested())
+{
+    throw std::system_error(make_error_code(std::errc::operation_canceled));
+}
+----
+
+Check `stop_requested()` at appropriate points—typically before expensive operations or at loop iterations.
+
+=== Triggering Cancellation
+
+[source,cpp]
+----
+std::stop_source source;
+run_async(ex, source.get_token())(my_task());
+
+// Later:
+source.request_stop();
+----
+
+The stop source controls the stop token. Calling `request_stop()` signals all holders of tokens from this source.
+
+=== Partial Results
+
+[source,cpp]
+----
+if (token.stop_requested())
+{
+    co_return partial_result;  // Return what we have
+}
+----
+
+Cancellation doesn't have to throw. You can return partial results or a sentinel value.
+
+== Output
+
+----
+Demo: Normal completion
+  Completed step 0
+  Completed step 1
+  Completed step 2
+  Completed step 3
+  Completed step 4
+Result: step0 step1 step2 step3 step4 
+
+Demo: Cancellation after 2 steps
+  Completed step 0
+  Completed step 1
+  Requesting stop...
+  Cancelled at step 2
+Cancelled (returned nullopt)
+----
+
+== Exercises
+
+1. Implement a retry-with-timeout pattern
+2. Add cancellation support to the echo session from the previous example
+3. Create a task that cancels itself after processing N items
+
+== Next Steps
+
+* xref:parallel-fetch.adoc[Parallel Fetch] — Concurrent operations with when_all
diff --git a/doc/modules/ROOT/pages/examples/type-erased-echo.adoc b/doc/modules/ROOT/pages/examples/type-erased-echo.adoc
new file mode 100644
index 00000000..53f7c87b
--- /dev/null
+++ b/doc/modules/ROOT/pages/examples/type-erased-echo.adoc
@@ -0,0 +1,185 @@
+= Type-Erased Echo
+
+Echo server demonstrating the compilation firewall pattern.
+
+== What You Will Learn
+
+* Using `any_stream` for transport-independent code
+* Physical isolation through separate compilation
+* Build time benefits of type erasure
+
+== Prerequisites
+
+* Completed xref:mock-stream-testing.adoc[Mock Stream Testing]
+* Understanding of type erasure from xref:../streams/isolation.adoc[Physical Isolation]
+
+== Source Code
+
+=== echo.hpp
+
+[source,cpp]
+----
+#ifndef ECHO_HPP
+#define ECHO_HPP
+
+#include <boost/capy/io/any_stream.hpp>
+#include <boost/capy/task.hpp>
+
+namespace myapp {
+
+// Type-erased interface: no template dependencies
+boost::capy::task<> echo_session(boost::capy::any_stream& stream);
+
+} // namespace myapp
+
+#endif
+----
+
+=== echo.cpp
+
+[source,cpp]
+----
+#include "echo.hpp"
+#include <boost/capy/read.hpp>
+#include <boost/capy/write.hpp>
+
+namespace myapp {
+
+using namespace boost::capy;
+
+task<> echo_session(any_stream& stream)
+{
+    char buffer[1024];
+    
+    for (;;)
+    {
+        // Read some data
+        auto [ec, n] = co_await stream.read_some(mutable_buffer(buffer));
+        
+        if (ec == cond::eof)
+            co_return;  // Client closed connection
+        
+        if (ec.failed())
+            throw std::system_error(ec);
+        
+        // Echo it back
+        auto [wec, wn] = co_await write(stream, const_buffer(buffer, n));
+        
+        if (wec.failed())
+            throw std::system_error(wec);
+    }
+}
+
+} // namespace myapp
+----
+
+=== main.cpp
+
+[source,cpp]
+----
+#include "echo.hpp"
+#include <boost/capy.hpp>
+#include <boost/capy/test/stream.hpp>
+#include <boost/capy/test/run_blocking.hpp>
+#include <iostream>
+
+using namespace boost::capy;
+
+void test_with_mock()
+{
+    test::stream mock;
+    mock.provide("Hello, ");
+    mock.provide("World!\n");
+    mock.provide_eof();
+    
+    any_stream stream{mock};
+    test::run_blocking(myapp::echo_session(stream));
+    
+    std::cout << "Echo output: " << mock.output() << "\n";
+}
+
+int main()
+{
+    test_with_mock();
+    
+    // With real sockets (using Corosio):
+    // tcp::socket socket;
+    // ... accept connection ...
+    // any_stream stream{socket};
+    // co_await myapp::echo_session(stream);
+    
+    return 0;
+}
+----
+
+== Build
+
+[source,cmake]
+----
+add_library(echo_lib echo.cpp)
+target_link_libraries(echo_lib PUBLIC capy)
+
+add_executable(echo_demo main.cpp)
+target_link_libraries(echo_demo PRIVATE echo_lib)
+----
+
+== Walkthrough
+
+=== The Interface
+
+[source,cpp]
+----
+// echo.hpp
+task<> echo_session(any_stream& stream);
+----
+
+The header declares only the signature. It includes `any_stream` and `task`, but no concrete transport types.
+
+Clients of this header:
+
+* Can call `echo_session` with any stream
+* Do not depend on implementation details
+* Do not recompile when implementation changes
+
+=== The Implementation
+
+[source,cpp]
+----
+// echo.cpp
+task<> echo_session(any_stream& stream)
+{
+    // Full implementation here
+}
+----
+
+The implementation:
+
+* Lives in a separate `.cpp` file
+* Compiles once
+* Can include any headers it needs internally
+
+=== Build Isolation
+
+When you change `echo.cpp`:
+
+* Only `echo.cpp` recompiles
+* `main.cpp` and other clients do not recompile
+* Link step updates the binary
+
+This scales: in large projects, changes to implementation files don't cascade through dependencies.
+
+== Output
+
+----
+Echo output: Hello, World!
+----
+
+== Exercises
+
+1. Add logging to `echo_session` and observe that clients don't recompile
+2. Create a second implementation file with different behavior (e.g., uppercase echo)
+3. Measure compile times with and without type erasure in a larger project
+
+== Next Steps
+
+* xref:timeout-cancellation.adoc[Timeout with Cancellation] — Stop tokens for timeout
diff --git a/doc/modules/ROOT/pages/index.adoc b/doc/modules/ROOT/pages/index.adoc
index 45a72508..3e885bec 100644
--- a/doc/modules/ROOT/pages/index.adoc
+++ b/doc/modules/ROOT/pages/index.adoc
@@ -1,99 +1,68 @@
-//
-// Copyright (c) 2025 Vinnie Falco (vinnie.falco@gmail.com)
-//
-// Distributed under the Boost Software License, Version 1.0. (See accompanying
-// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-//
-// Official repository: https://github.com/cppalliance/capy
-//
-
-= Boost.Capy
-
-Boost.Capy is the foundation that C++20 coroutines need for buffer-oriented I/O.
-It solves the completion-context problem: ensuring your coroutine always resumes
-on your designated executor in a single-threaded or multi-threaded environment,
-while providing buffer sequences, stream concepts, synchronization, and test
-mocks that I/O libraries require.
+= Capy
+
+Capy abstracts away sockets, files, and asynchrony with type-erased streams and buffer sequences—code compiles fast because the implementation is hidden. It provides the framework for concurrent algorithms that transact in buffers of memory: networking, serial ports, console, timers, and any platform I/O. This is only possible because Capy is coroutine-only, enabling optimizations and ergonomics that hybrid approaches must sacrifice.
 
 == What This Library Does
 
-* **IoAwaitable protocol** — automatic executor affinity through every `co_await`
-* **Lazy coroutine tasks** with forward-propagating stop tokens and cancellation
-* **Buffer sequences** — `const_buffer`, `mutable_buffer`, dynamic buffers, and algorithms for scatter/gather I/O
-* **Stream concepts** — `ReadStream`, `WriteStream`, `ReadSource`, `WriteSink` for generic buffer-oriented operations
-* **Concurrent composition** via `when_all`, `when_any` with structured error propagation
-* **Execution contexts** — thread pool with service management
-* **Strand** for safe concurrent access without mutexes
-* **Async synchronization** — `coro_lock`, `async_event`
-* **Frame allocation recycling** for zero steady-state allocations
-* **Test utilities** — mock streams, mock sources/sinks, and error injection
+* *Lazy coroutine tasks* — `task<T>` with forward-propagating stop tokens and automatic cancellation
+* *Buffer sequences* — taken straight from Asio and improved
+* *Stream concepts* — `ReadStream`, `WriteStream`, `ReadSource`, `WriteSink`, `BufferSource`, `BufferSink`
+* *Type-erased streams* — `any_stream`, `any_read_stream`, `any_write_stream` for fast compilation
+* *Concurrency facilities* — executors, strands, thread pools, `when_all`, `when_any`
+* *Test utilities* — mock streams, mock sources/sinks, error injection
 
 == What This Library Does Not Do
 
-* **Networking primitives** — no sockets, HTTP, or protocol implementations
-* **Platform-specific event loops** — integrate with io_uring, IOCP, or your platform's I/O framework
-* **The sender/receiver model** — Capy uses the _IoAwaitable_ protocol, not `std::execution`
+* *Networking* — no sockets, acceptors, or DNS; that's what Corosio provides
+* *Protocols* — no HTTP, WebSocket, or TLS; see the Http and Beast2 libraries
+* *Platform event loops* — no io_uring, IOCP, epoll, or kqueue; Capy is the layer above
+* *Callbacks or futures* — coroutine-only means no other continuation styles
+* *Sender/receiver* — Capy uses the IoAwaitable protocol, not `std::execution`
 
 == Target Audience
 
-**Library authors** — Use stream concepts (`ReadStream`, `WriteStream`,
-`ReadSource`, `WriteSink`), algorithms (`read`, `write`), buffer sequences,
-and the _IoAwaitable_ protocol to build composable I/O frameworks without
-being tied to a particular implementation.
-
-**Application developers** — Program against `task<T>`, `when_all`, stream
-concepts, and buffer sequences. Test async logic with mock streams.
+* Users of *Corosio* — portable coroutine networking
+* Users of *Http* — sans-I/O HTTP/1.1 clients and servers
+* Users of *Websocket* — sans-I/O WebSocket
+* Users of *Beast2* — high-level HTTP/WebSocket servers
+* Users of *Burl* — high-level HTTP client
 
-**Migration from callbacks** — Coroutine-native model with explicit executor
-propagation. No thread-local state or intermediate adapters.
-
-**High-performance systems** — Frame allocation recycling (zero steady-state
-allocations), scatter/gather buffer sequences, type erasure only at boundaries.
+All of these are built on Capy. Understanding its concepts—tasks, buffer sequences, streams, executors—unlocks the full power of the stack.
 
 == Design Philosophy
 
-**Lazy by default.** Tasks suspend immediately on creation. This enables
-structured composition where parent coroutines naturally await their children.
-Eager execution is available through `run_async`.
-
-**Affinity through the protocol.** The executor propagates through
-`await_suspend` parameters, not through thread-local storage or global state.
-This makes the data flow explicit and testable.
-
-**Type erasure at boundaries.** Tasks use type-erased executors (`executor_ref`)
-internally, paying the indirection cost once rather than templating everything.
-For I/O-bound code, this cost is negligible.
-
-**Composition over inheritance.** Buffer types, stream concepts, and awaitables
-are designed to compose cleanly rather than requiring deep class hierarchies.
+* *Use case first.* Buffer sequences, stream concepts, executor affinity—these exist because I/O code needs them, not because they're theoretically elegant.
+* *Coroutines-only.* No callbacks, futures, or sender/receiver. Hybrid support forces compromises; full commitment unlocks optimizations that adapted models cannot achieve.
+* *Address the complaints of C++.* Type erasure at boundaries, minimal dependencies, and hidden implementations keep builds fast and templates manageable.
 
 == Requirements
 
-**Assumed Knowledge:**
+=== Assumed Knowledge
 
-* C++20 language features (concepts, ranges, coroutines syntax)
-* Basic understanding of concurrent programming
-* Familiarity with `system::error_code` error handling
+* C++20 coroutines, concepts, and ranges
+* Basic concurrent programming
 
-**Compiler Support:**
+=== Compiler Support
 
-* GCC 13+
+* GCC 12+
 * Clang 17+
-* MSVC 14.34+ (Visual Studio 2022 17+)
+* Apple-Clang (macOS 14+)
+* MSVC 14.34+
+* MinGW
+
+=== Dependencies
 
-**Dependencies:**
+None. Capy is self-contained and does not require Boost.
 
-* C++20 standards-compliant library with `<coroutine>` support.
-* Boost: Assert, Compat, Config, Core, Mp11, Predef, System, Throw_exception, Variant2, Winapi
+=== Linking
 
-**Linking:**
+Capy is a compiled library. Link against `capy`.
 
-Capy is a compiled library and needs to be linked against its other Boost dependencies, such as Boost.System.
+== Code Convention
 
 [NOTE]
 ====
-**Code Convention:** Examples in this documentation assume these declarations
-are in effect unless otherwise noted:
+Unless otherwise specified, all code examples in this documentation assume the following:
 
 [source,cpp]
 ----
@@ -104,42 +73,46 @@ using namespace boost::capy;
 
 == Quick Example
 
+This example demonstrates a minimal coroutine that reads from a stream and echoes the data back:
+
 [source,cpp]
 ----
-#include <boost/capy/task.hpp>
-#include <boost/capy/ex/run_async.hpp>
-#include <boost/capy/ex/thread_pool.hpp>
-#include <iostream>
-
-using boost::capy::task;
-using boost::capy::run_async;
-using boost::capy::thread_pool;
+#include <boost/capy.hpp>
 
-task<int> compute()
-{
-    co_return 42;
-}
+using namespace boost::capy;
 
-task<void> run()
+task<> echo(any_stream& stream)
 {
-    int result = co_await compute();
-    std::cout << "Result: " << result << "\n";
+    char buf[1024];
+    for(;;)
+    {
+        auto [ec, n] = co_await stream.read_some(mutable_buffer(buf));
+        if(ec.failed())
+            co_return;
+        auto [wec, wn] = co_await write(stream, const_buffer(buf, n));
+        if(wec.failed())
+            co_return;
+    }
 }
 
 int main()
 {
-    thread_pool pool(1);
-    run_async(pool.get_executor())(run());
-    // Pool destructor waits for completion
+    thread_pool pool;
+    // In a real application, you would obtain a stream from Corosio
+    // and call: run_async(pool.get_executor())(echo(stream));
+    return 0;
 }
 ----
 
-The key insight: both `run()` and `compute()` execute on the same executor
-because affinity propagated automatically through `co_await`.
+The `echo` function accepts an `any_stream&`—a type-erased wrapper that works with any concrete stream implementation. The function reads data into a buffer, then writes it back. Both operations use `co_await` to suspend until the I/O completes.
+
+The `task<>` return type (equivalent to `task<void>`) creates a lazy coroutine that does not start executing until awaited or launched with `run_async`.
 
 == Next Steps
 
-* xref:quick-start.adoc[Quick Start] — Get a working program in 5 minutes
-* xref:cpp20-coroutines/foundations.adoc[Introduction to C++20 Coroutines] — Understand the machinery
-* xref:io-awaitables/concepts.adoc[I/O Awaitables] — Learn the affinity protocol
-* xref:library/task.adoc[The task<T> Type] — Start using Capy
+* xref:quick-start.adoc[Quick Start] — Set up your first Capy project
+* xref:cpp20-coroutines/foundations.adoc[C++20 Coroutines Tutorial] — Learn coroutines from the ground up
+* xref:concurrency/foundations.adoc[Concurrency Tutorial] — Understand threads, mutexes, and synchronization
+* xref:coroutines/tasks.adoc[Coroutines in Capy] — Deep dive into `task<T>` and the IoAwaitable protocol
+* xref:buffers/overview.adoc[Buffer Sequences] — Master the concept-driven buffer model
+* xref:streams/overview.adoc[Stream Concepts] — Understand the six stream concepts
diff --git a/doc/modules/ROOT/pages/streams/algorithms.adoc b/doc/modules/ROOT/pages/streams/algorithms.adoc
new file mode 100644
index 00000000..11b0d30b
--- /dev/null
+++ b/doc/modules/ROOT/pages/streams/algorithms.adoc
@@ -0,0 +1,253 @@
+= Transfer Algorithms
+
+This section explains the composed read/write operations and transfer algorithms.
+
+== Prerequisites
+
+* Completed xref:buffer-concepts.adoc[Buffer Sources and Sinks]
+* Understanding of all six stream concepts
+
+== Composed Read/Write
+
+The partial operations (`read_some`, `write_some`) often require looping. Capy provides composed operations that handle the loops for you.
+
+=== read
+
+Fills a buffer completely by looping `read_some`:
+
+[source,cpp]
+----
+#include <boost/capy/read.hpp>
+
+template<ReadStream Stream, MutableBufferSequence Buffers>
+task<std::pair<error_code, std::size_t>>
+read(Stream& stream, Buffers const& buffers);
+----
+
+Keeps reading until:
+
+* Buffer is full (`n == buffer_size(buffers)`)
+* EOF is reached (returns `cond::eof` with partial count)
+* Error occurs (returns error with partial count)
+
+Example:
+
+[source,cpp]
+----
+char buf[1024];
+auto [ec, n] = co_await read(stream, mutable_buffer(buf));
+// n == 1024, or ec indicates why not
+----
+
+=== read with DynamicBuffer
+
+Reads until EOF into a growable buffer:
+
+[source,cpp]
+----
+template<ReadStream Stream, DynamicBuffer Buffer>
+task<std::pair<error_code, std::size_t>>
+read(Stream& stream, Buffer&& buffer);
+----
+
+Example:
+
+[source,cpp]
+----
+flat_dynamic_buffer buf;
+auto [ec, n] = co_await read(stream, buf);
+// buf now contains all data until EOF
+----
+
+=== write
+
+Writes all data by looping `write_some`:
+
+[source,cpp]
+----
+#include <boost/capy/write.hpp>
+
+template<WriteStream Stream, ConstBufferSequence Buffers>
+task<std::pair<error_code, std::size_t>>
+write(Stream& stream, Buffers const& buffers);
+----
+
+Keeps writing until:
+
+* All data is written (`n == buffer_size(buffers)`)
+* Error occurs (returns error with partial count)
+
+Example:
+
+[source,cpp]
+----
+co_await write(stream, make_buffer("Hello, World!"));
+----
+
+== Transfer Algorithms
+
+Transfer algorithms move data between sources/sinks and streams.
+
+=== push_to
+
+Transfers data from a `BufferSource` to a destination:
+
+[source,cpp]
+----
+#include <boost/capy/io/push_to.hpp>
+
+// To WriteSink (with EOF propagation)
+template<BufferSource Source, WriteSink Sink>
+task<std::pair<error_code, std::size_t>>
+push_to(Source& source, Sink& sink);
+
+// To WriteStream (streaming, no EOF)
+template<BufferSource Source, WriteStream Stream>
+task<std::pair<error_code, std::size_t>>
+push_to(Source& source, Stream& stream);
+----
+
+The source provides buffers via `pull()`. Data is pushed to the destination. Buffer ownership stays with the source—no intermediate copying when possible.
+
+Example:
+
+[source,cpp]
+----
+// Transfer file to network
+mmap_source file("large_file.bin");
+co_await push_to(file, socket);
+----
+
+=== pull_from
+
+Transfers data from a source to a `BufferSink`:
+
+[source,cpp]
+----
+#include <boost/capy/io/pull_from.hpp>
+
+// From ReadSource
+template<ReadSource Source, BufferSink Sink>
+task<std::pair<error_code, std::size_t>>
+pull_from(Source& source, Sink& sink);
+
+// From ReadStream (streaming)
+template<ReadStream Stream, BufferSink Sink>
+task<std::pair<error_code, std::size_t>>
+pull_from(Stream& stream, Sink& sink);
+----
+
+The sink provides writable buffers via `prepare()`. Data is pulled from the source directly into the sink's buffers.
+
+Example:
+
+[source,cpp]
+----
+// Receive network data into compression buffer
+compression_sink compressor;
+co_await pull_from(socket, compressor);
+----
+
+=== Why No buffer-to-buffer?
+
+There is no `push_to(BufferSource, BufferSink)` because it would require redundant copying. The source owns read-only buffers; the sink owns writable buffers. Transferring between them would need an intermediate copy, defeating the zero-copy purpose.
+
+Instead, compose with an intermediate stage:
+
+[source,cpp]
+----
+// Transform: BufferSource → processing → BufferSink
+task<> process_pipeline(any_buffer_source& source, any_buffer_sink& sink)
+{
+    const_buffer src_bufs[8];
+    
+    while (true)
+    {
+        auto [ec, count] = co_await source.pull(src_bufs, 8);
+        if (count == 0)
+            break;
+        
+        for (std::size_t i = 0; i < count; ++i)
+        {
+            auto processed = transform(src_bufs[i]);
+            
+            // Write processed data to sink
+            mutable_buffer dst_bufs[8];
+            std::size_t dst_count = sink.prepare(dst_bufs, 8);
+            
+            std::size_t copied = buffer_copy(
+                std::span(dst_bufs, dst_count),
+                make_buffer(processed));
+            
+            co_await sink.commit(copied);
+        }
+    }
+    
+    co_await sink.commit_eof();
+}
+----
+
+== Naming Convention
+
+The algorithm names reflect buffer ownership:
+
+[cols="1,2"]
+|===
+| Name | Meaning
+
+| `push_to`
+| Source provides buffers → push data to destination
+
+| `pull_from`
+| Sink provides buffers → pull data from source
+|===
+
+The preposition indicates the direction of buffer provision, not data flow.
+
+== Error Handling
+
+All transfer algorithms return `(error_code, std::size_t)`:
+
+* `error_code` — Success, EOF, or error condition
+* `std::size_t` — Total bytes transferred before return
+
+On error, partial transfer may have occurred. The returned count indicates how much was transferred.
+
+Example:
+
+[source,cpp]
+----
+auto [ec, total] = co_await push_to(source, sink);
+
+if (ec == cond::eof)
+{
+    // Normal completion
+    std::cout << "Transferred " << total << " bytes\n";
+}
+else if (ec.failed())
+{
+    // Error occurred
+    std::cerr << "Error after " << total << " bytes: " << ec.message() << "\n";
+}
+----
+
+== Reference
+
+[cols="1,3"]
+|===
+| Header | Description
+
+| `<boost/capy/read.hpp>`
+| Composed read operations
+
+| `<boost/capy/write.hpp>`
+| Composed write operations
+
+| `<boost/capy/io/push_to.hpp>`
+| BufferSource → WriteSink/WriteStream transfer
+
+| `<boost/capy/io/pull_from.hpp>`
+| ReadSource/ReadStream → BufferSink transfer
+|===
+
+You have now learned about transfer algorithms. Continue to xref:isolation.adoc[Physical Isolation] to learn how type erasure enables compilation firewalls.
diff --git a/doc/modules/ROOT/pages/streams/buffer-concepts.adoc b/doc/modules/ROOT/pages/streams/buffer-concepts.adoc
new file mode 100644
index 00000000..3acdaebc
--- /dev/null
+++ b/doc/modules/ROOT/pages/streams/buffer-concepts.adoc
@@ -0,0 +1,266 @@
+= Buffer Sources and Sinks
+
+This section explains the `BufferSource` and `BufferSink` concepts for zero-copy I/O where the callee owns the buffers.
+
+== Prerequisites
+
+* Completed xref:sources-sinks.adoc[Sources and Sinks]
+* Understanding of caller-owns-buffers patterns
+
+== Callee-Owns-Buffers Pattern
+
+With streams and sources/sinks, the *caller* provides buffers:
+
+[source,cpp]
+----
+// Caller owns the buffer
+char my_buffer[1024];
+co_await stream.read_some(mutable_buffer(my_buffer));
+----
+
+Data flows: source → caller's buffer → processing
+
+With buffer sources/sinks, the *callee* provides buffers:
+
+[source,cpp]
+----
+// Callee owns the buffers
+const_buffer bufs[8];
+auto [ec, count] = co_await source.pull(bufs, 8);
+// bufs now point into source's internal storage
+----
+
+Data flows: source's internal buffer → processing (no copy!)
+
+== BufferSource
+
+A `BufferSource` provides read-only buffers from its internal storage:
+
+[source,cpp]
+----
+template<typename T>
+concept BufferSource =
+    requires(T& source, const_buffer* arr, std::size_t max_count) {
+        { source.pull(arr, max_count) } -> IoAwaitable;
+    };
+----
+
+=== pull Semantics
+
+[source,cpp]
+----
+IoAwaitable auto pull(const_buffer* arr, std::size_t max_count);
+----
+
+Returns an awaitable yielding `(error_code, std::size_t)`:
+
+* On success: `!ec.failed()`, fills `arr[0..count-1]` with buffer descriptors
+* On exhausted: `count == 0` indicates no more data
+* On error: `ec.failed()`
+
+The buffers point into the source's internal storage. You must consume all returned data before calling `pull()` again—the previous buffers become invalid.
+
+=== Example
+
+[source,cpp]
+----
+template<BufferSource Source>
+task<> process_source(Source& source)
+{
+    const_buffer bufs[8];
+    
+    for (;;)
+    {
+        auto [ec, count] = co_await source.pull(bufs, 8);
+        
+        if (ec.failed())
+            throw std::system_error(ec);
+        
+        if (count == 0)
+            break;  // Source exhausted
+        
+        // Process buffers (zero-copy!)
+        for (std::size_t i = 0; i < count; ++i)
+            process_data(bufs[i].data(), bufs[i].size());
+    }
+}
+----
+
+== BufferSink
+
+A `BufferSink` provides writable buffers for direct write access:
+
+[source,cpp]
+----
+template<typename T>
+concept BufferSink =
+    requires(T& sink, mutable_buffer* arr, std::size_t max_count, std::size_t n) {
+        { sink.prepare(arr, max_count) } -> std::same_as<std::size_t>;
+        { sink.commit(n) } -> IoAwaitable;
+        { sink.commit(n, bool{}) } -> IoAwaitable;
+        { sink.commit_eof() } -> IoAwaitable;
+    };
+----
+
+=== prepare Semantics
+
+[source,cpp]
+----
+std::size_t prepare(mutable_buffer* arr, std::size_t max_count);
+----
+
+Synchronous operation. Returns the number of buffers prepared (may be less than `max_count`). Fills `arr[0..count-1]` with writable buffer descriptors.
+
+=== commit Semantics
+
+[source,cpp]
+----
+IoAwaitable auto commit(std::size_t n);
+IoAwaitable auto commit(std::size_t n, bool eof);
+IoAwaitable auto commit_eof();
+----
+
+Finalizes `n` bytes of prepared data. The `eof` flag or `commit_eof()` signals end-of-stream.
+
+=== Example
+
+[source,cpp]
+----
+template<BufferSink Sink>
+task<> write_to_sink(Sink& sink, std::span<char const> data)
+{
+    std::size_t written = 0;
+    
+    while (written < data.size())
+    {
+        mutable_buffer bufs[8];
+        std::size_t count = sink.prepare(bufs, 8);
+        
+        if (count == 0)
+            throw std::runtime_error("sink full");
+        
+        // Copy into sink's buffers
+        std::size_t copied = 0;
+        for (std::size_t i = 0; i < count && written < data.size(); ++i)
+        {
+            std::size_t chunk = (std::min)(
+                bufs[i].size(),
+                data.size() - written);
+            std::memcpy(bufs[i].data(), data.data() + written, chunk);
+            written += chunk;
+            copied += chunk;
+        }
+        
+        bool eof = (written == data.size());
+        co_await sink.commit(copied, eof);
+    }
+}
+----
+
+== Zero-Copy Benefits
+
+Buffer sources/sinks enable true zero-copy I/O:
+
+=== Memory-Mapped Files
+
+[source,cpp]
+----
+class mmap_source : public BufferSource
+{
+    void* mapped_region_;
+    std::size_t size_;
+    std::size_t offset_ = 0;
+    
+public:
+    io_result<std::size_t> pull(const_buffer* arr, std::size_t max_count)
+    {
+        if (offset_ >= size_)
+            co_return {error_code{}, 0};  // Exhausted
+        
+        // Return pointer into mapped memory—no copy!
+        arr[0] = const_buffer(
+            static_cast<char*>(mapped_region_) + offset_,
+            size_ - offset_);
+        offset_ = size_;
+        
+        co_return {error_code{}, 1};
+    }
+};
+----
+
+=== Hardware Buffers
+
+DMA buffers, GPU memory, network card ring buffers—all can be exposed through `BufferSource`/`BufferSink` without intermediate copying.
+
+== Type-Erasing Wrappers
+
+=== any_buffer_source
+
+[source,cpp]
+----
+#include <boost/capy/io/any_buffer_source.hpp>
+
+template<BufferSource S>
+any_buffer_source(S& source);
+----
+
+=== any_buffer_sink
+
+[source,cpp]
+----
+#include <boost/capy/io/any_buffer_sink.hpp>
+
+template<BufferSink S>
+any_buffer_sink(S& sink);
+----
+
+== Example: Compression Pipeline
+
+[source,cpp]
+----
+// Compressor provides compressed data via BufferSource
+// Decompressor consumes compressed data via BufferSink
+
+task<> decompress_stream(any_buffer_source& compressed, any_write_sink& output)
+{
+    const_buffer bufs[8];
+    
+    for (;;)
+    {
+        auto [ec, count] = co_await compressed.pull(bufs, 8);
+        if (ec.failed())
+            throw std::system_error(ec);
+        if (count == 0)
+            break;
+        
+        for (std::size_t i = 0; i < count; ++i)
+        {
+            auto decompressed = decompress_block(bufs[i]);
+            co_await output.write(make_buffer(decompressed));
+        }
+    }
+    
+    co_await output.write_eof();
+}
+----
+
+== Reference
+
+[cols="1,3"]
+|===
+| Header | Description
+
+| `<boost/capy/concept/buffer_source.hpp>`
+| BufferSource concept definition
+
+| `<boost/capy/concept/buffer_sink.hpp>`
+| BufferSink concept definition
+
+| `<boost/capy/io/any_buffer_source.hpp>`
+| Type-erased buffer source wrapper
+
+| `<boost/capy/io/any_buffer_sink.hpp>`
+| Type-erased buffer sink wrapper
+|===
+
+You have now learned about buffer sources and sinks for zero-copy I/O. Continue to xref:algorithms.adoc[Transfer Algorithms] to learn about composed read/write operations.
diff --git a/doc/modules/ROOT/pages/streams/isolation.adoc b/doc/modules/ROOT/pages/streams/isolation.adoc
new file mode 100644
index 00000000..590a04ca
--- /dev/null
+++ b/doc/modules/ROOT/pages/streams/isolation.adoc
@@ -0,0 +1,240 @@
+= Physical Isolation
+
+This section explains how type-erased wrappers enable compilation firewalls and transport-independent APIs.
+
+== Prerequisites
+
+* Completed xref:algorithms.adoc[Transfer Algorithms]
+* Understanding of type-erased wrappers
+
+== The Compilation Firewall Pattern
+
+C++ templates are powerful but have a cost: every instantiation compiles in every translation unit that uses it. Change a template, and everything that includes it recompiles.
+
+Type-erased wrappers break this dependency:
+
+[source,cpp]
+----
+// protocol.hpp - No template dependencies
+#pragma once
+#include <boost/capy/io/any_stream.hpp>
+#include <boost/capy/task.hpp>
+
+// Declaration only - no implementation details
+task<> handle_protocol(any_stream& stream);
+----
+
+[source,cpp]
+----
+// protocol.cpp - Implementation isolated here
+#include "protocol.hpp"
+#include <boost/capy/read.hpp>
+#include <boost/capy/write.hpp>
+
+task<> handle_protocol(any_stream& stream)
+{
+    char buf[1024];
+    
+    for (;;)
+    {
+        auto [ec, n] = co_await stream.read_some(mutable_buffer(buf));
+        if (ec.failed())
+            co_return;
+        
+        // Process and respond...
+        co_await write(stream, make_buffer(response));
+    }
+}
+----
+
+Changes to `protocol.cpp` only recompile that file. The header is stable.
+
+== Build Time Benefits
+
+=== Before (Templates Everywhere)
+
+[source,cpp]
+----
+// Old approach: template propagates everywhere
+template<typename Stream>
+task<> handle_protocol(Stream& stream);
+
+// Every caller instantiates for their stream type
+// Changes force recompilation of all callers
+----
+
+=== After (Type Erasure at Boundary)
+
+[source,cpp]
+----
+// New approach: concrete signature
+task<> handle_protocol(any_stream& stream);
+
+// Implementation compiles once
+// Callers only depend on the signature
+----
+
+=== Measured Impact
+
+For a typical project:
+
+* Template-heavy design: 10+ seconds incremental rebuild
+* Type-erased boundaries: < 1 second incremental rebuild
+
+The difference grows with project size.
+
+== Transport Independence
+
+Type erasure decouples your code from specific transport implementations:
+
+[source,cpp]
+----
+// Your library code
+task<> send_message(any_write_sink& sink, message const& msg)
+{
+    co_await sink.write(make_buffer(msg.header));
+    co_await sink.write(make_buffer(msg.body), true);
+}
+----
+
+Callers provide any conforming implementation:
+
+[source,cpp]
+----
+// TCP socket
+tcp::socket socket;
+any_write_sink sink{socket};
+send_message(sink, msg);
+
+// TLS stream
+tls::stream stream;
+any_write_sink sink{stream};
+send_message(sink, msg);
+
+// HTTP chunked encoding
+chunked_sink chunked{underlying};
+any_write_sink sink{chunked};
+send_message(sink, msg);
+
+// Test mock
+test::write_sink mock;
+any_write_sink sink{mock};
+send_message(sink, msg);
+----
+
+Same `send_message` function, different transports—compile once, use everywhere.
+
+== API Design Guidelines
+
+=== Accept Type-Erased References
+
+[source,cpp]
+----
+// Good: accepts any stream
+task<> process(any_stream& stream);
+
+// Avoid: forces specific type
+task<> process(tcp::socket& socket);
+----
+
+=== Wrap at Call Site
+
+[source,cpp]
+----
+void caller(tcp::socket& socket)
+{
+    any_stream stream{socket};  // Wrap here
+    process(stream);            // Call with erased type
+}
+----
+
+The wrapper creation is explicit and localized.
+
+=== Return Concrete Types (Usually)
+
+[source,cpp]
+----
+// OK: factory returns concrete type
+tcp::socket create_socket();
+
+// Then caller wraps if needed
+auto socket = create_socket();
+any_stream stream{socket};
+----
+
+Returning type-erased values forces heap allocation. Return concrete types when the caller knows what they need.
+
+== Example: Library API
+
+[source,cpp]
+----
+// http_client.hpp
+#pragma once
+#include <boost/capy/io/any_read_source.hpp>
+#include <boost/capy/io/any_write_sink.hpp>
+
+struct http_request
+{
+    std::string method;
+    std::string url;
+    std::map<std::string, std::string> headers;
+};
+
+struct http_response
+{
+    int status_code;
+    std::map<std::string, std::string> headers;
+    any_read_source body;  // Body is a source, not a buffer
+};
+
+// Send request, receive response
+// Works with any transport that provides any_stream
+task<http_response> send_request(any_stream& conn, http_request const& req);
+----
+
+Users don't need to know how HTTP is implemented:
+
+[source,cpp]
+----
+// User code
+tcp::socket socket;
+// ... connect ...
+
+any_stream conn{socket};
+auto response = co_await send_request(conn, {
+    .method = "GET",
+    .url = "/api/data"
+});
+
+// Read body through type-erased source
+flat_dynamic_buffer buf;
+co_await read(response.body, buf);
+----
+
+The HTTP library is isolated from transport details. It compiles once. Users bring their own transport.
+
+== Wrapper Overhead
+
+Type erasure has runtime cost:
+
+* Virtual dispatch for each operation
+* Extra indirection through wrapper
+
+But the cost is typically negligible compared to I/O latency. A nanosecond of dispatch overhead is invisible next to microsecond network operations.
+
+When profiling shows wrapper overhead matters:
+
+1. Consider batching operations
+2. Use concrete types in hot paths
+3. Accept the template cost for that code path
+
+== Reference
+
+Type-erased wrappers are in `<boost/capy/io/>`:
+
+* `any_stream`
+* `any_read_stream`, `any_write_stream`
+* `any_read_source`, `any_write_sink`
+* `any_buffer_source`, `any_buffer_sink`
+
+You have now completed the Stream Concepts section. These abstractions—streams, sources, sinks, and their type-erased wrappers—form the foundation for Capy's I/O model. Continue to xref:../examples/hello-task.adoc[Example Programs] to see complete working examples.
diff --git a/doc/modules/ROOT/pages/streams/overview.adoc b/doc/modules/ROOT/pages/streams/overview.adoc
new file mode 100644
index 00000000..a5f3f758
--- /dev/null
+++ b/doc/modules/ROOT/pages/streams/overview.adoc
@@ -0,0 +1,199 @@
+= Stream Concepts Overview
+
+This section introduces Capy's stream concepts—the abstractions that enable data to flow through your programs.
+
+== Prerequisites
+
+* Completed xref:../buffers/dynamic.adoc[Buffer Sequences]
+* Understanding of buffer sequences and the DynamicBuffer concept
+
+== Six Concepts for Data Flow
+
+Capy defines six concepts for I/O operations, organized in three pairs:
+
+[cols="1,1,2"]
+|===
+| Concept | Direction | Description
+
+| `ReadStream`
+| Read
+| Partial reads—returns whatever is available
+
+| `WriteStream`
+| Write
+| Partial writes—writes as much as possible
+
+| `ReadSource`
+| Read
+| Complete reads—fills buffer or signals EOF
+
+| `WriteSink`
+| Write
+| Complete writes with explicit EOF signaling
+
+| `BufferSource`
+| Read
+| Callee-owns-buffers read pattern
+
+| `BufferSink`
+| Write
+| Callee-owns-buffers write pattern
+|===
+
+== Streams vs Sources/Sinks
+
+=== Streams: Partial I/O
+
+Stream operations transfer *some* data and return. They do not guarantee a specific amount:
+
+[source,cpp]
+----
+// ReadStream: may return fewer bytes than buffer can hold
+auto [ec, n] = co_await stream.read_some(buffer);
+// n might be 1, might be 1000, might be buffer_size(buffer)
+
+// WriteStream: may write fewer bytes than provided
+auto [ec, n] = co_await stream.write_some(buffers);
+// n might be less than buffer_size(buffers)
+----
+
+This matches raw OS behavior—syscalls return when data is available, not when buffers are full.
+
+=== Sources/Sinks: Complete I/O
+
+Source/sink operations complete fully or signal completion:
+
+[source,cpp]
+----
+// ReadSource: fills buffer completely, or returns EOF/error with partial
+auto [ec, n] = co_await source.read(buffer);
+// n == buffer_size(buffer), or ec indicates why not
+
+// WriteSink: writes all data, with explicit EOF
+co_await sink.write(buffers, true);  // true = EOF after this
+----
+
+These are higher-level abstractions built on streams.
+
+== Buffer Sources/Sinks: Callee-Owns-Buffers
+
+The third pair inverts buffer ownership:
+
+* With streams/sources/sinks, the caller provides buffers
+* With buffer sources/sinks, the callee provides buffers
+
+[source,cpp]
+----
+// BufferSource: callee provides read-only buffers
+const_buffer bufs[8];
+auto [ec, count] = co_await source.pull(bufs, 8);
+// bufs[0..count-1] now point to source's internal data
+
+// BufferSink: callee provides writable buffers
+mutable_buffer bufs[8];
+std::size_t count = sink.prepare(bufs, 8);
+// Write into bufs[0..count-1], then commit
+co_await sink.commit(bytes_written);
+----
+
+This pattern enables zero-copy I/O—data never moves through intermediate buffers.
+
+== Type-Erasing Wrappers
+
+Each concept has a corresponding type-erasing wrapper:
+
+[cols="1,1"]
+|===
+| Concept | Wrapper
+
+| `ReadStream`
+| `any_read_stream`
+
+| `WriteStream`
+| `any_write_stream`
+
+| (Both)
+| `any_stream`
+
+| `ReadSource`
+| `any_read_source`
+
+| `WriteSink`
+| `any_write_sink`
+
+| `BufferSource`
+| `any_buffer_source`
+
+| `BufferSink`
+| `any_buffer_sink`
+|===
+
+These wrappers enable:
+
+* APIs independent of concrete transport
+* Compilation firewalls (fast incremental builds)
+* Runtime polymorphism without virtual inheritance in user code
+
+== Choosing the Right Abstraction
+
+=== Use Streams When:
+
+* You need raw, unbuffered I/O
+* You're implementing a protocol that processes data incrementally
+* Performance is critical and you want minimal abstraction
+
+=== Use Sources/Sinks When:
+
+* You need complete data units (messages, records, frames)
+* EOF signaling is part of your protocol
+* You're composing transformations (compression, encryption)
+
+=== Use Buffer Sources/Sinks When:
+
+* Zero-copy is essential
+* The source/sink owns the memory (memory-mapped files, hardware buffers)
+* You're implementing a processing pipeline
+
+== The Value Proposition
+
+Type-erased wrappers let you write transport-agnostic code:
+
+[source,cpp]
+----
+// This function works with any stream implementation
+task<> echo(any_stream& stream)
+{
+    char buf[1024];
+    for (;;)
+    {
+        auto [ec, n] = co_await stream.read_some(mutable_buffer(buf));
+        if (ec.failed())
+            co_return;
+        
+        auto [wec, wn] = co_await write(stream, const_buffer(buf, n));
+        if (wec.failed())
+            co_return;
+    }
+}
+----
+
+The caller decides the concrete implementation:
+
+[source,cpp]
+----
+// Works with Corosio TCP sockets
+any_stream s1{tcp_socket};
+echo(s1);
+
+// Works with TLS streams
+any_stream s2{tls_stream};
+echo(s2);
+
+// Works with test mocks
+any_stream s3{test::stream{}};
+echo(s3);
+----
+
+Same code, different transports—compile once, link anywhere.
+
+Continue to xref:streams.adoc[Streams (Partial I/O)] to learn the `ReadStream` and `WriteStream` concepts in detail.
diff --git a/doc/modules/ROOT/pages/streams/sources-sinks.adoc b/doc/modules/ROOT/pages/streams/sources-sinks.adoc
new file mode 100644
index 00000000..6b8a9f0a
--- /dev/null
+++ b/doc/modules/ROOT/pages/streams/sources-sinks.adoc
@@ -0,0 +1,231 @@
+= Sources and Sinks (Complete I/O)
+
+This section explains the `ReadSource` and `WriteSink` concepts for complete I/O operations with EOF signaling.
+
+== Prerequisites
+
+* Completed xref:streams.adoc[Streams (Partial I/O)]
+* Understanding of partial I/O with `ReadStream` and `WriteStream`
+
+== ReadSource
+
+A `ReadSource` provides complete read operations that fill buffers entirely or signal EOF:
+
+[source,cpp]
+----
+template<typename T>
+concept ReadSource =
+    requires(T& source, mutable_buffer_archetype buffers) {
+        { source.read(buffers) } -> IoAwaitable;
+    };
+----
+
+=== read Semantics
+
+[source,cpp]
+----
+template<MutableBufferSequence MB>
+IoAwaitable auto read(MB const& buffers);
+----
+
+Returns an awaitable yielding `(error_code, std::size_t)`:
+
+* On success: `!ec.failed()`, and `n == buffer_size(buffers)` (buffer completely filled)
+* On EOF: `ec == cond::eof`, and `n` is bytes read before EOF (partial read)
+* On error: `ec.failed()`, and `n` is bytes read before error
+
+The key difference from `ReadStream`: a successful read fills the buffer completely.
+
+=== Use Cases
+
+* Reading fixed-size records
+* Reading message frames with known sizes
+* Filling buffers for batch processing
+
+=== Example
+
+[source,cpp]
+----
+template<ReadSource Source>
+task<std::optional<message>> read_message(Source& source)
+{
+    // Read fixed-size header
+    message_header header;
+    auto [ec, n] = co_await source.read(
+        mutable_buffer(&header, sizeof(header)));
+    
+    if (ec == cond::eof && n == 0)
+        co_return std::nullopt;  // Clean EOF
+    
+    if (ec.failed())
+        throw std::system_error(ec);
+    
+    // Read variable-size body
+    std::vector<char> body(header.body_size);
+    auto [ec2, n2] = co_await source.read(make_buffer(body));
+    
+    if (ec2.failed())
+        throw std::system_error(ec2);
+    
+    co_return message{header, std::move(body)};
+}
+----
+
+== WriteSink
+
+A `WriteSink` provides complete write operations with explicit EOF signaling:
+
+[source,cpp]
+----
+template<typename T>
+concept WriteSink =
+    requires(T& sink, const_buffer_archetype buffers) {
+        { sink.write(buffers) } -> IoAwaitable;
+        { sink.write(buffers, bool{}) } -> IoAwaitable;
+        { sink.write_eof() } -> IoAwaitable;
+    };
+----
+
+=== write Semantics
+
+[source,cpp]
+----
+// Write data
+template<ConstBufferSequence CB>
+IoAwaitable auto write(CB const& buffers);
+
+// Write data with optional EOF
+template<ConstBufferSequence CB>
+IoAwaitable auto write(CB const& buffers, bool eof);
+
+// Signal EOF without data
+IoAwaitable auto write_eof();
+----
+
+The `eof` parameter signals end-of-stream after the data is written.
+
+After calling `write_eof()` or `write(buffers, true)`, no further writes are permitted.
+
+=== Use Cases
+
+* Writing complete messages
+* HTTP body transmission (content-length or chunked)
+* Protocol framing with explicit termination
+
+=== Example
+
+[source,cpp]
+----
+template<WriteSink Sink>
+task<> send_response(Sink& sink, response const& resp)
+{
+    // Write headers
+    auto headers = format_headers(resp);
+    co_await sink.write(make_buffer(headers));
+    
+    // Write body with EOF
+    co_await sink.write(make_buffer(resp.body), true);  // EOF after body
+}
+----
+
+== Type-Erasing Wrappers
+
+=== any_read_source
+
+[source,cpp]
+----
+#include <boost/capy/io/any_read_source.hpp>
+
+template<ReadSource S>
+any_read_source(S& source);
+----
+
+=== any_write_sink
+
+[source,cpp]
+----
+#include <boost/capy/io/any_write_sink.hpp>
+
+template<WriteSink S>
+any_write_sink(S& sink);
+----
+
+== Example: HTTP Body Handler
+
+The HTTP library uses `any_write_sink` for body transmission:
+
+[source,cpp]
+----
+// HTTP response handler doesn't know the underlying transport
+task<> send_body(any_write_sink& body, std::string_view data)
+{
+    // Works whether body is:
+    // - Direct socket write (content-length)
+    // - Chunked encoding wrapper
+    // - Compressed stream
+    // - Test mock
+    
+    co_await body.write(make_buffer(data), true);
+}
+----
+
+The caller decides the concrete implementation:
+
+[source,cpp]
+----
+// Content-length mode
+content_length_sink cl_sink(socket, data.size());
+any_write_sink body{cl_sink};
+send_body(body, data);
+
+// Chunked mode
+chunked_sink ch_sink(socket);
+any_write_sink body{ch_sink};
+send_body(body, data);
+----
+
+Same `send_body` function, different transfer encodings—the library handles the difference.
+
+== Streams vs Sources/Sinks
+
+[cols="1,1,1"]
+|===
+| Aspect | Streams | Sources/Sinks
+
+| Transfer amount
+| Partial (whatever is available)
+| Complete (fill buffer or EOF)
+
+| EOF handling
+| Implicit (read returns 0)
+| Explicit (`write_eof()`, EOF flag)
+
+| Use case
+| Raw I/O, incremental processing
+| Message-oriented protocols
+
+| Abstraction level
+| Lower (closer to OS)
+| Higher (application-friendly)
+|===
+
+== Reference
+
+[cols="1,3"]
+|===
+| Header | Description
+
+| `<boost/capy/concept/read_source.hpp>`
+| ReadSource concept definition
+
+| `<boost/capy/concept/write_sink.hpp>`
+| WriteSink concept definition
+
+| `<boost/capy/io/any_read_source.hpp>`
+| Type-erased read source wrapper
+
+| `<boost/capy/io/any_write_sink.hpp>`
+| Type-erased write sink wrapper
+|===
+
+You have now learned about sources and sinks for complete I/O. Continue to xref:buffer-concepts.adoc[Buffer Sources and Sinks] to learn about the callee-owns-buffers pattern.
diff --git a/doc/modules/ROOT/pages/streams/streams.adoc b/doc/modules/ROOT/pages/streams/streams.adoc
new file mode 100644
index 00000000..ff2bc545
--- /dev/null
+++ b/doc/modules/ROOT/pages/streams/streams.adoc
@@ -0,0 +1,236 @@
+= Streams (Partial I/O)
+
+This section explains the `ReadStream` and `WriteStream` concepts for partial I/O operations.
+
+== Prerequisites
+
+* Completed xref:overview.adoc[Stream Concepts Overview]
+* Understanding of the six stream concept categories
+
+== ReadStream
+
+A type satisfies `ReadStream` if it provides partial read operations via `read_some`:
+
+[source,cpp]
+----
+template<typename T>
+concept ReadStream =
+    requires(T& stream, mutable_buffer_archetype buffers) {
+        { stream.read_some(buffers) } -> IoAwaitable;
+    };
+----
+
+=== read_some Semantics
+
+[source,cpp]
+----
+template<MutableBufferSequence MB>
+IoAwaitable auto read_some(MB const& buffers);
+----
+
+Returns an awaitable yielding `(error_code, std::size_t)`:
+
+* On success: `!ec.failed()`, and `n >= 1` bytes were read
+* On error: `ec.failed()`, and `n == 0`
+* On EOF: `ec == cond::eof`, and `n == 0`
+
+If `buffer_empty(buffers)` is true, completes immediately with `n == 0` and no error.
+
+=== Partial Transfer
+
+`read_some` may return fewer bytes than the buffer can hold:
+
+[source,cpp]
+----
+char buf[1024];
+auto [ec, n] = co_await stream.read_some(mutable_buffer(buf));
+// n might be 1, might be 500, might be 1024
+// The only guarantee: if !ec.failed() && n > 0
+----
+
+This matches underlying OS behavior—reads return when *some* data is available.
+
+=== Example
+
+[source,cpp]
+----
+template<ReadStream Stream>
+task<> dump_stream(Stream& stream)
+{
+    char buf[256];
+    
+    for (;;)
+    {
+        auto [ec, n] = co_await stream.read_some(mutable_buffer(buf));
+        
+        if (ec == cond::eof)
+            break;  // End of stream
+        
+        if (ec.failed())
+            throw std::system_error(ec);
+        
+        std::cout.write(buf, n);
+    }
+}
+----
+
+== WriteStream
+
+A type satisfies `WriteStream` if it provides partial write operations via `write_some`:
+
+[source,cpp]
+----
+template<typename T>
+concept WriteStream =
+    requires(T& stream, const_buffer_archetype buffers) {
+        { stream.write_some(buffers) } -> IoAwaitable;
+    };
+----
+
+=== write_some Semantics
+
+[source,cpp]
+----
+template<ConstBufferSequence CB>
+IoAwaitable auto write_some(CB const& buffers);
+----
+
+Returns an awaitable yielding `(error_code, std::size_t)`:
+
+* On success: `!ec.failed()`, and `n >= 1` bytes were written
+* On error: `ec.failed()`, and `n` indicates bytes written before error (may be 0)
+
+If `buffer_empty(buffers)` is true, completes immediately with `n == 0` and no error.
+
+=== Partial Transfer
+
+`write_some` may write fewer bytes than provided:
+
+[source,cpp]
+----
+auto [ec, n] = co_await stream.write_some(make_buffer(large_data));
+// n might be less than large_data.size()
+----
+
+To write all data, loop until complete (or use the `write()` composed operation).
+
+== Type-Erasing Wrappers
+
+=== any_read_stream
+
+Wraps any `ReadStream` in a type-erased container:
+
+[source,cpp]
+----
+#include <boost/capy/io/any_read_stream.hpp>
+
+template<ReadStream S>
+any_read_stream(S& stream);
+----
+
+The wrapped stream is referenced—the original must outlive the wrapper.
+
+=== any_write_stream
+
+Wraps any `WriteStream`:
+
+[source,cpp]
+----
+#include <boost/capy/io/any_write_stream.hpp>
+
+template<WriteStream S>
+any_write_stream(S& stream);
+----
+
+=== any_stream
+
+Wraps bidirectional streams (both `ReadStream` and `WriteStream`):
+
+[source,cpp]
+----
+#include <boost/capy/io/any_stream.hpp>
+
+template<ReadStream S>
+    requires WriteStream<S>
+any_stream(S& stream);
+----
+
+=== Wrapper Characteristics
+
+All wrappers share these properties:
+
+* *Reference semantics* — Wrap existing objects without ownership
+* *Preallocated coroutine frame* — Zero steady-state allocation
+* *Move-only* — Non-copyable; moving transfers the cached frame
+* *Lifetime requirement* — Wrapped object must outlive wrapper
+
+Example usage:
+
+[source,cpp]
+----
+void process_stream(any_stream& stream);
+
+tcp::socket socket;
+// ... connect socket ...
+
+any_stream wrapped{socket};  // Type erasure here
+process_stream(wrapped);      // process_stream doesn't know about tcp::socket
+----
+
+== Example: Echo Server with any_stream
+
+[source,cpp]
+----
+// echo.hpp - Header only declares the signature
+task<> handle_connection(any_stream& stream);
+
+// echo.cpp - Implementation in separate translation unit
+task<> handle_connection(any_stream& stream)
+{
+    char buf[1024];
+    
+    for (;;)
+    {
+        // Read some data
+        auto [ec, n] = co_await stream.read_some(mutable_buffer(buf));
+        
+        if (ec == cond::eof)
+            co_return;  // Client closed connection
+        
+        if (ec.failed())
+            throw std::system_error(ec);
+        
+        // Echo it back
+        auto [wec, wn] = co_await write(stream, const_buffer(buf, n));
+        
+        if (wec.failed())
+            throw std::system_error(wec);
+    }
+}
+----
+
+The implementation doesn't know the concrete stream type. It compiles once and works with any transport.
+
+== Reference
+
+[cols="1,3"]
+|===
+| Header | Description
+
+| `<boost/capy/concept/read_stream.hpp>`
+| ReadStream concept definition
+
+| `<boost/capy/concept/write_stream.hpp>`
+| WriteStream concept definition
+
+| `<boost/capy/io/any_read_stream.hpp>`
+| Type-erased read stream wrapper
+
+| `<boost/capy/io/any_write_stream.hpp>`
+| Type-erased write stream wrapper
+
+| `<boost/capy/io/any_stream.hpp>`
+| Type-erased bidirectional stream wrapper
+|===
+
+You have now learned the stream concepts for partial I/O. Continue to xref:sources-sinks.adoc[Sources and Sinks] to learn about complete I/O with EOF signaling.