Skip to content

A handful of optimizations for the DRC collector#12974

Open
fitzgen wants to merge 7 commits intobytecodealliance:mainfrom
fitzgen:drc-runtime-improvements
Open

A handful of optimizations for the DRC collector#12974
fitzgen wants to merge 7 commits intobytecodealliance:mainfrom
fitzgen:drc-runtime-improvements

Conversation

@fitzgen
Copy link
Copy Markdown
Member

@fitzgen fitzgen commented Apr 6, 2026

Depends on #12969

See each commit message for details.

More coming soon after this.

@fitzgen fitzgen requested review from a team as code owners April 6, 2026 19:47
@fitzgen fitzgen requested review from alexcrichton and removed request for a team April 6, 2026 19:47
fitzgen added 7 commits April 6, 2026 13:24
Also add fast-path entry points that take a `u32` size directly that has already
been rounded to the free list's alignment.

Altogether, this shaves off ~309B instructions retired (48%) from the benchmark
in bytecodealliance#11141
Ideally we would just use a `SecondaryMap<VMSharedTypeIndex, TraceInfo>` here
but allocating `O(num engine types)` space inside a store that uses only a
couple types seems not great. So instead, we just have a fixed size cache that
is probably big enough for most things in practice.
Inline `dec_ref`, `trace_gc_ref`, and `dealloc` into
`dec_ref_and_maybe_dealloc`'s main loop so that we read the `VMDrcHeader` once
per object to get `ref_count`, type index, and `object_size`, avoiding 3
separate GC heap accesses and bounds checks per freed object.

For struct tracing, read gc_ref fields directly from the heap slice at known
offsets instead of going through gc_object_data → object_range → object_size
which would re-read the object_size from the header.

301,333,979,721 -> 291,038,676,119 instructions (~3.4% improvement)
…exists

When the GC store is already initialized and the allocation succeeds, avoid
async machinery entirely. This avoids the overhead of taking/restoring fiber
async state pointers on every allocation.

291,038,676,119 -> 230,503,364,489 instructions (~20.8% improvement)
Avoids converting `ModuleInternedTypeIndex` to `VMSharedTypeIndex` in host code,
which requires look ups in the instance's module's `TypeCollection`. We already
have helpers to do this conversion inline in JIT code.

230,503,364,489 -> 216,937,168,529 instructions (~5.9% improvement)
Moves the `externref` host data cleanup inside the `ty.is_none()` branch of
`dec_ref_and_maybe_dealloc`, since only `externref`s have host
data. Additionally the type check is sort of expensive since it involves
additional bounds-checked reads from the GC heap.
@fitzgen fitzgen force-pushed the drc-runtime-improvements branch from 79013cf to 56a5b5a Compare April 6, 2026 20:24
@github-actions github-actions bot added wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:ref-types Issues related to reference types and GC in Wasmtime labels Apr 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 6, 2026

Subscribe to Label Action

cc @fitzgen

Details This issue or pull request has been labeled: "wasmtime:api", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: wasmtime:ref-types

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:ref-types Issues related to reference types and GC in Wasmtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant