fix(threadpool): per-task nursery scope for sync tasks — snapshot UAF (no scope-pointer swap)#160
Merged
Merged
Conversation
…shot UAF) A synchronous ThreadPool task ran its body inline in the worker and freed the per-task snapshot arena the moment the body returned — while a coroutine the body spawned but never awaited was still pending. That coroutine's op_array lived in the freed arena (Windows debug-heap crash; ASAN-caught on Linux). Run the body as a coroutine in its own per-task nursery Scope instead: Async\spawn() inside the body lands in that scope on its own (no coro->scope hijacking). On task exit the worker cancels the scope and drains it via the new ZEND_ASYNC_SCOPE_AWAIT_AFTER_CANCELLATION — awaiting physical disposal of every spawned coroutine — before freeing the snapshot. Invariant: no spawned coroutine outlives the snapshot. Requires php-src ABI v0.20.0 (zend_async_scope_await_after_cancellation_fn). Regression test tests/thread_pool/065-task_scope_nursery_no_uaf.phpt.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Running the body as a coroutine meant a fatal (e.g. OOM) re-raised zend_bailout() out of the task coroutine and longjmped to the worker's bailout handler, past the future resolution — and that handler only drains the channel, not the already-dequeued in-flight task, so the awaiter hung. Track the in-flight task's future in a volatile and reject it from the bailout handler (ThreadTransferException), restoring the prior behaviour. Also trim the over-verbose comments from the previous commit. Test tests/thread_pool/066-task_fatal_rejects_future.phpt.
… task Two correctness fixes on top of the per-task nursery scope: 1. Fatal in a task body no longer UAFs the snapshot. The entry op_array's name strings (function_name, filename) are materialized into refcounted heap strings (thread_materialize_op_array_names), so holders that outlive the snapshot arena — the closure freed at request shutdown, and PG(last_error_file) — are freed by refcount instead of dangling into the freed arena. The sync body's bailout is caught and the future rejected. 2. A bailout through a parked SUSPEND no longer leaks the libuv loop. The ThreadChannel send/recv triggers and the worker's slot_event are uv_async handles; a fatal re-raising zend_bailout() through SUSPEND skipped their dispose, leaving an open uv_async that blocked uv_loop_close (EBUSY) and leaked the loop internals. Both paths now catch the bailout, dispose the trigger on the worker's reactor, and re-raise. Debug builds dump any handle that survives reactor shutdown. Tests: thread_pool/066 (sync fatal rejects future), 067 (coroutine fatal disposes channel trigger), 068 (concurrency fatal disposes slot_event). All green under ASAN+LSan.
…longer null) Two related improvements to how a ThreadPool task's fatal reaches the awaiter: 1. Coroutine-mode tasks no longer resolve to a silent null on a fatal. The pool_task_dispose completion path only checked coroutine->exception, but a bailout (OOM/fatal/exit) is not a thrown exception, so the future completed with UNDEF -> null. It now treats "no exception + UNDEF result" as a bailout and rejects the future with the cause, matching synchronous mode. 2. The cause crosses to the parent as a plain string, not a PHP object built in the dying worker. New async_future_shared_state_reject_bailout stores a pestrdup'd message (system malloc, not bound by memory_limit) on the shared state; shared_state_trigger_cb builds the ThreadTransferException on the healthy parent side. So the real fatal message survives even when the worker can no longer allocate a PHP object under the exhausted allocator. Tests thread_pool/067, 068 now assert the cause is delivered (was %ANULL).
…n_thread A transferred closure's use-variable names were copied with zend_string_dup(), which returns interned strings as-is. Those names are interned in the parent thread, so the worker's snapshot stored pointers into the parent's interned- string table. If the parent ended/aborted its request (zend_interned_strings_ deactivate frees the table) while a worker was still constructing the closure, the worker read freed memory in async_thread_create_closure (ZSTR_VAL(key)) — heap-use-after-free, reproducible under WSL2+ASAN, deterministic 6/6. Copy each key into a private persistent string (zend_string_init persistent=1) owned by the snapshot, independent of the parent's interned-string table. Verified: tests/thread/050,052 now pass 5/5 under --asan + detect_leaks=1 (were failing 6/6); full thread suite 178 passed / 0 failed.
…bug assert) The previous commit copied the captured-variable name into a persistent string with zend_string_init(persistent=1) but did not flag it GC_PERSISTENT_LOCAL. On the parent thread (where zend_rc_debug is enabled in --enable-rc-debug ASAN builds) the GC_ADDREF inside zend_hash_add then asserts "GC_PERSISTENT without GC_PERSISTENT_LOCAL" — failing every cross-thread test in the CI OpCache ASAN job (remote_future/*, thread/030,031, task_group/040). Mark the key PERSISTENT_LOCAL right after init, exactly as the dst->bound_vars hash itself is marked one line above. Verified under a full -DZEND_RC_DEBUG=1 + opcache.enable_cli=1 + ASAN/LSan rebuild: remote_future/thread/task_group/thread_pool/thread_channel = 236 passed, 0 failed.
The cross-thread bailout cause does not belong on the generic zend_future_shared_state_t (it is shared by RemoteFuture and every cross-thread future, not just ThreadPool). Drop the bailout_cause field, reject_bailout, the destination-side build branch, and the async_thread_create_transfer_exception helper. A task that bailed out now resolves its future the normal way: build a ThreadTransferException in the worker and reject via async_future_shared_state_ reject. Coroutine-mode still reports the cause instead of a silent null (pool_task_dispose detects "no exception + UNDEF result" and rejects). Also trims the verbose comments added during this work to 1-2 lines. Verified under -DZEND_RC_DEBUG=1 + opcache.enable_cli=1 + ASAN/LSan: remote_future/thread/task_group/thread_pool/thread_channel = 236 passed, 0 failed.
EdmondDantes
added a commit
that referenced
this pull request
Jun 8, 2026
…in CI
063-bootloader_exception: a bootloader exception reaches the awaiter either as
the real RuntimeException (submit landed before the worker closed the channel)
or as Async\ThreadTransferException carrying the same message (submit raced the
close). Both deliver the true cause, so assert on the message instead of the
timing-dependent class.
034-stdio_fopen_fwrite: concurrent fopen('a') appends to a single shared file
from two workers are not atomic on Windows and can lose a write (got lines=3).
Each task now writes its own file ('w'), keeping "worker I/O hits disk"
coverage without the cross-thread append contention.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A synchronous
ThreadPooltask (new ThreadPool(N), i.e.coroutine_mode = false) ran its body inline in the worker's main coroutine and freed the per-task snapshot arena the instant the body returned. If the body hadAsync\spawn()ed a coroutine it never awaited, that coroutine was still pending in the scheduler — and itsop_arraylives in the just-freed arena. Running it later dereferenced freed memory: hard crash on the Windows debug heap, latent + ASAN-caught on Linux.Approach (supersedes #159)
#159 fixed this by temporarily swapping the worker coroutine's
scopepointer to redirectAsync\spawn()into a per-task nursery, plus a hand-rolled cancel/await/pin dance. This PR removes that swap entirely.The task body now runs as a real coroutine in its own per-task nursery
Scope. BecauseCURRENT SCOPEfollows the running coroutine,Async\spawn()inside the body lands in that scope on its own — nocoro->scopehijacking. On task exit the worker cancels the scope and drains it — awaiting the physical disposal of every spawned coroutine — before freeing the snapshot.Invariant: no coroutine spawned by the task outlives the snapshot. (Whether such a coroutine got to run at all is timing-dependent and intentionally not asserted — the only guarantee is no use-after-free.)
The drain reuses the canonical zombie-aware logic of
Scope::awaitAfterCancellationinstead of re-implementing it: its body is factored into a C function exposed on the engine API.Changes
true-async-stable): newzend_async_scope_await_after_cancellation_fn+ZEND_ASYNC_SCOPE_AWAIT_AFTER_CANCELLATIONmacro.Scope::awaitAfterCancellationis now a thin wrapper over the shared C core.thread_pool.crewritten (NEW_SCOPE nursery → SPAWN_WITH → await body → cancel → drain → free snapshot);scope.c/.hfactor + register the C core;async_API.cregistration.pool_task_dispose/coroutine-mode path untouched.Tests
tests/thread_pool/065-task_scope_nursery_no_uaf.phpt(spawned-not-awaited coroutine still pending at task return). Fulltests/thread_pool/66/66 (+1 pre-existing XFAIL) andscope/task_group/task_set/coroutine/spawn/await/future/pool/channel(≈480) green under ASAN+ZTS.Requires php-src
true-async-stableat ABI v0.20.0 (pushed first so this PR's CI clones it).