fix(threadpool): per-task nursery scope for sync tasks — snapshot UAF (no scope-pointer swap) by EdmondDantes · Pull Request #160 · true-async/php-async

EdmondDantes · 2026-06-07T10:15:20Z

Problem

A synchronous ThreadPool task (new ThreadPool(N), i.e. coroutine_mode = false) ran its body inline in the worker's main coroutine and freed the per-task snapshot arena the instant the body returned. If the body had Async\spawn()ed a coroutine it never awaited, that coroutine was still pending in the scheduler — and its op_array lives in the just-freed arena. Running it later dereferenced freed memory: hard crash on the Windows debug heap, latent + ASAN-caught on Linux.

Approach (supersedes #159)

#159 fixed this by temporarily swapping the worker coroutine's scope pointer to redirect Async\spawn() into a per-task nursery, plus a hand-rolled cancel/await/pin dance. This PR removes that swap entirely.

The task body now runs as a real coroutine in its own per-task nursery Scope. Because CURRENT SCOPE follows the running coroutine, Async\spawn() inside the body lands in that scope on its own — no coro->scope hijacking. On task exit the worker cancels the scope and drains it — awaiting the physical disposal of every spawned coroutine — before freeing the snapshot.

Invariant: no coroutine spawned by the task outlives the snapshot. (Whether such a coroutine got to run at all is timing-dependent and intentionally not asserted — the only guarantee is no use-after-free.)

The drain reuses the canonical zombie-aware logic of Scope::awaitAfterCancellation instead of re-implementing it: its body is factored into a C function exposed on the engine API.

Changes

php-src (ABI v0.20.0, already on true-async-stable): new zend_async_scope_await_after_cancellation_fn + ZEND_ASYNC_SCOPE_AWAIT_AFTER_CANCELLATION macro. Scope::awaitAfterCancellation is now a thin wrapper over the shared C core.
ext/async: sync task path in thread_pool.c rewritten (NEW_SCOPE nursery → SPAWN_WITH → await body → cancel → drain → free snapshot); scope.c/.h factor + register the C core; async_API.c registration. pool_task_dispose/coroutine-mode path untouched.

Tests

tests/thread_pool/065-task_scope_nursery_no_uaf.phpt (spawned-not-awaited coroutine still pending at task return). Full tests/thread_pool/ 66/66 (+1 pre-existing XFAIL) and scope/task_group/task_set/coroutine/spawn/await/future/pool/channel (≈480) green under ASAN+ZTS.

Requires php-src true-async-stable at ABI v0.20.0 (pushed first so this PR's CI clones it).

…shot UAF) A synchronous ThreadPool task ran its body inline in the worker and freed the per-task snapshot arena the moment the body returned — while a coroutine the body spawned but never awaited was still pending. That coroutine's op_array lived in the freed arena (Windows debug-heap crash; ASAN-caught on Linux). Run the body as a coroutine in its own per-task nursery Scope instead: Async\spawn() inside the body lands in that scope on its own (no coro->scope hijacking). On task exit the worker cancels the scope and drains it via the new ZEND_ASYNC_SCOPE_AWAIT_AFTER_CANCELLATION — awaiting physical disposal of every spawned coroutine — before freeing the snapshot. Invariant: no spawned coroutine outlives the snapshot. Requires php-src ABI v0.20.0 (zend_async_scope_await_after_cancellation_fn). Regression test tests/thread_pool/065-task_scope_nursery_no_uaf.phpt.

codecov · 2026-06-07T10:22:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Running the body as a coroutine meant a fatal (e.g. OOM) re-raised zend_bailout() out of the task coroutine and longjmped to the worker's bailout handler, past the future resolution — and that handler only drains the channel, not the already-dequeued in-flight task, so the awaiter hung. Track the in-flight task's future in a volatile and reject it from the bailout handler (ThreadTransferException), restoring the prior behaviour. Also trim the over-verbose comments from the previous commit. Test tests/thread_pool/066-task_fatal_rejects_future.phpt.

… task Two correctness fixes on top of the per-task nursery scope: 1. Fatal in a task body no longer UAFs the snapshot. The entry op_array's name strings (function_name, filename) are materialized into refcounted heap strings (thread_materialize_op_array_names), so holders that outlive the snapshot arena — the closure freed at request shutdown, and PG(last_error_file) — are freed by refcount instead of dangling into the freed arena. The sync body's bailout is caught and the future rejected. 2. A bailout through a parked SUSPEND no longer leaks the libuv loop. The ThreadChannel send/recv triggers and the worker's slot_event are uv_async handles; a fatal re-raising zend_bailout() through SUSPEND skipped their dispose, leaving an open uv_async that blocked uv_loop_close (EBUSY) and leaked the loop internals. Both paths now catch the bailout, dispose the trigger on the worker's reactor, and re-raise. Debug builds dump any handle that survives reactor shutdown. Tests: thread_pool/066 (sync fatal rejects future), 067 (coroutine fatal disposes channel trigger), 068 (concurrency fatal disposes slot_event). All green under ASAN+LSan.

…longer null) Two related improvements to how a ThreadPool task's fatal reaches the awaiter: 1. Coroutine-mode tasks no longer resolve to a silent null on a fatal. The pool_task_dispose completion path only checked coroutine->exception, but a bailout (OOM/fatal/exit) is not a thrown exception, so the future completed with UNDEF -> null. It now treats "no exception + UNDEF result" as a bailout and rejects the future with the cause, matching synchronous mode. 2. The cause crosses to the parent as a plain string, not a PHP object built in the dying worker. New async_future_shared_state_reject_bailout stores a pestrdup'd message (system malloc, not bound by memory_limit) on the shared state; shared_state_trigger_cb builds the ThreadTransferException on the healthy parent side. So the real fatal message survives even when the worker can no longer allocate a PHP object under the exhausted allocator. Tests thread_pool/067, 068 now assert the cause is delivered (was %ANULL).

…n_thread A transferred closure's use-variable names were copied with zend_string_dup(), which returns interned strings as-is. Those names are interned in the parent thread, so the worker's snapshot stored pointers into the parent's interned- string table. If the parent ended/aborted its request (zend_interned_strings_ deactivate frees the table) while a worker was still constructing the closure, the worker read freed memory in async_thread_create_closure (ZSTR_VAL(key)) — heap-use-after-free, reproducible under WSL2+ASAN, deterministic 6/6. Copy each key into a private persistent string (zend_string_init persistent=1) owned by the snapshot, independent of the parent's interned-string table. Verified: tests/thread/050,052 now pass 5/5 under --asan + detect_leaks=1 (were failing 6/6); full thread suite 178 passed / 0 failed.

…bug assert) The previous commit copied the captured-variable name into a persistent string with zend_string_init(persistent=1) but did not flag it GC_PERSISTENT_LOCAL. On the parent thread (where zend_rc_debug is enabled in --enable-rc-debug ASAN builds) the GC_ADDREF inside zend_hash_add then asserts "GC_PERSISTENT without GC_PERSISTENT_LOCAL" — failing every cross-thread test in the CI OpCache ASAN job (remote_future/*, thread/030,031, task_group/040). Mark the key PERSISTENT_LOCAL right after init, exactly as the dst->bound_vars hash itself is marked one line above. Verified under a full -DZEND_RC_DEBUG=1 + opcache.enable_cli=1 + ASAN/LSan rebuild: remote_future/thread/task_group/thread_pool/thread_channel = 236 passed, 0 failed.

The cross-thread bailout cause does not belong on the generic zend_future_shared_state_t (it is shared by RemoteFuture and every cross-thread future, not just ThreadPool). Drop the bailout_cause field, reject_bailout, the destination-side build branch, and the async_thread_create_transfer_exception helper. A task that bailed out now resolves its future the normal way: build a ThreadTransferException in the worker and reject via async_future_shared_state_ reject. Coroutine-mode still reports the cause instead of a silent null (pool_task_dispose detects "no exception + UNDEF result" and rejects). Also trims the verbose comments added during this work to 1-2 lines. Verified under -DZEND_RC_DEBUG=1 + opcache.enable_cli=1 + ASAN/LSan: remote_future/thread/task_group/thread_pool/thread_channel = 236 passed, 0 failed.

…in CI 063-bootloader_exception: a bootloader exception reaches the awaiter either as the real RuntimeException (submit landed before the worker closed the channel) or as Async\ThreadTransferException carrying the same message (submit raced the close). Both deliver the true cause, so assert on the message instead of the timing-dependent class. 034-stdio_fopen_fwrite: concurrent fopen('a') appends to a single shared file from two workers are not atomic on Windows and can lose a write (got lines=3). Each task now writes its own file ('w'), keeping "worker I/O hits disk" coverage without the cross-thread append contention.

EdmondDantes added 6 commits June 7, 2026 11:13

EdmondDantes merged commit 9af3e2e into main Jun 8, 2026
9 checks passed

EdmondDantes deleted the windows-threadpool-snapshot-uaf-clean branch June 8, 2026 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(threadpool): per-task nursery scope for sync tasks — snapshot UAF (no scope-pointer swap)#160

fix(threadpool): per-task nursery scope for sync tasks — snapshot UAF (no scope-pointer swap)#160
EdmondDantes merged 7 commits into
mainfrom
windows-threadpool-snapshot-uaf-clean

EdmondDantes commented Jun 7, 2026

Uh oh!

codecov Bot commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EdmondDantes commented Jun 7, 2026

Problem

Approach (supersedes #159)

Changes

Tests

Uh oh!

codecov Bot commented Jun 7, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant