Skip to content

fix(sys): use THREAD_LOCAL TLS for mimalloc v3 on Apple#71

Draft
shulaoda wants to merge 1 commit into
napi-rs:mainfrom
shulaoda:fix/apple-thread-local-recurse-guard
Draft

fix(sys): use THREAD_LOCAL TLS for mimalloc v3 on Apple#71
shulaoda wants to merge 1 commit into
napi-rs:mainfrom
shulaoda:fix/apple-thread-local-recurse-guard

Conversation

@shulaoda
Copy link
Copy Markdown
Contributor

Summary

Replaces the Apple-branch TLS model in mimalloc v3's prim.h selector with MI_TLS_MODEL_THREAD_LOCAL + MI_TLS_RECURSE_GUARD, solving both the multi-instance heap-corruption issue (Mode A below) and the rayon-worker SIGABRT issue introduced by #67 (Mode B below). Patch is applied by build.rs at build time, idempotent, with the submodule left pristine in git.

Problem

Two distinct failure modes have been observed in production:

Mode A — multi-instance heap corruption (FIXED_SLOT, upstream default)

mimalloc v3 on Apple stores the per-thread theap pointer at fixed TCB slots 108/109, accessed directly via tpidrro_el0 (arm64) or %gs: (x86_64). When multiple statically-linked mimalloc-safe instances coexist in one process — the canonical case being multiple Node.js napi addons — every instance writes its own heap pointer to the same slot on every thread it touches.

On any thread called into by more than one addon — the main JS thread above all, plus libuv worker pool and ThreadsafeFunction dispatcher threads — the second instance reads back the first's heap pointer, treats it as its own, and corrupts both.

mimalloc upstream acknowledges this in prim.h:

"This goes wrong though if the OS or a library uses the same fixed slot."

Mode B — rayon-worker SIGABRT (DYNAMIC_PTHREADS, #67's workaround)

PR #67 addressed Mode A by passing -DMI_HAS_TLS_SLOT=0, routing Apple to MI_TLS_MODEL_DYNAMIC_PTHREADS (per-image pthread_key_create + pthread_setspecific). That code path is primarily designed and tested for OpenBSD/Android; its interaction with long-lived non-tokio threads on macOS is fragile.

Observed in rolldown:

  • An unnamed pthread (identified via FP-chain walking + dyld image-list resolution as a rayon worker spawned through oxc_cfg → oxc_index → rayon) hits a pthread_setspecific timing inconsistency on its first or post-main allocation.
  • mimalloc returns NULL.
  • Rust's handle_alloc_error fires, the process aborts with SIGABRT after the bundle has otherwise completed successfully (Finished in ~15ms is printed to stdout immediately before).
  • Reproducible at ~5–15% per CI run with it.repeats(100) on rolldown's cli-e2e.test.ts.

Solution

Switch the Apple branch in mimalloc v3's prim.h selector to use:

MI_TLS_MODEL_THREAD_LOCAL = 1
MI_TLS_RECURSE_GUARD       = 1

Both are mature upstream code paths — THREAD_LOCAL is the default on Linux/FreeBSD/NetBSD/etc.; RECURSE_GUARD has been in the source since v2 specifically to handle dyld-TLV first-touch on macOS.

Concern How this addresses it
multi-instance per-image isolation mi_decl_hidden mi_decl_thread produces per-image __thread symbols → dyld TLV allocates per-image per-thread storage. No shared TCB slot.
dyld TLV first-touch malloc recursion MI_TLS_RECURSE_GUARD short-circuits the fast path via a plain non-TLS _mi_process_is_initialized bool until process init completes.
thread-id fast path Left untouched. MI_HAS_TLS_SLOT stays at the upstream default 1, so _mi_prim_thread_id() keeps using mi_prim_tls_slot(0) (Apple's system-defined thread-id TSD slot — semantically shared across consumers, no conflict, no allocation).
long-lived non-tokio threads (rayon, etc.) No pthread_setspecific timing surface — every thread just reads/writes its own TLV slot via a normal __thread access.

Why a build.rs patch (not a fork or vendoring)

mimalloc3 is a git submodule pinned to upstream microsoft/mimalloc at v3.3.2 (commit 30b2d9d8). Tradeoffs of the alternatives:

  • Forking upstream is high-touch and creates a perpetual sync burden.
  • Vendoring the whole tree (~150 files) makes upgrades painful and review noisy.
  • build.rs patch is what's done here — minimal, surgical, and:
    • Idempotent: detects an already-applied patch by a marker comment, skips quietly.
    • Loud on drift: assert!s if the upstream selector block can't be located. If a future mimalloc release rewrites prim.h's selector, the next submodule bump fails the build with a clear error message pointing back at the patcher.
    • Submodule stays pristine in git: the patch is applied to the working tree only. git submodule update --init on a fresh clone works unchanged. CI re-applies per build.
    • Cargo invariants: registered with cargo:rerun-if-changed=<prim.h> for change detection. Emitted via cargo:warning so the patch application is visible in build logs.

Migration

  • Supersedes fix(sys): disable mimalloc v3 FIXED_SLOT TLS on macOS #67. The -DMI_HAS_TLS_SLOT=0 cflag is removed.
  • No source changes required for downstream users of the v3 feature when bumping from 0.1.61.
  • Single-instance use cases (typical napi addon, e.g. rolldown) gain robustness: the rayon-worker SIGABRT is fixed.
  • Multi-instance use cases (rare; multiple napi addons coexisting) also gain isolation: each addon gets its own dyld TLV storage, no TCB slot collision.

Performance

Fast-path overhead from RECURSE_GUARD: one BSS load + one branch on _mi_process_is_initialized. After the first microsecond of process life the branch is predictable to "yes" — sub-microsecond impact in microbenchmarks per upstream documentation; not measurable in real workloads.

shulaoda added a commit to rolldown/rolldown that referenced this pull request May 22, 2026
)

See napi-rs/mimalloc-safe#71 & https://github.com/rolldown/rolldown/actions/runs/26264825278/job/77305893292

This is temporarily merged to allow upgrade and integration, and will be validated in subsequent CI runs. It has already been confirmed to work in the current PR. It will be reverted before the next release, and mimalloc-safe will switch back to the official release version for production use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant