Skip to content

Inline the copying collector's bump allocation in Wasm code#13323

Merged
fitzgen merged 4 commits into
bytecodealliance:mainfrom
fitzgen:wasmtime-inline-copying-collector-alloc
May 12, 2026
Merged

Inline the copying collector's bump allocation in Wasm code#13323
fitzgen merged 4 commits into
bytecodealliance:mainfrom
fitzgen:wasmtime-inline-copying-collector-alloc

Conversation

@fitzgen
Copy link
Copy Markdown
Member

@fitzgen fitzgen commented May 7, 2026

No description provided.

@fitzgen fitzgen requested review from a team as code owners May 7, 2026 23:32
@fitzgen fitzgen requested review from a team, alexcrichton, dicej and uweigand and removed request for a team and uweigand May 7, 2026 23:32
@github-actions github-actions Bot added wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:ref-types Issues related to reference types and GC in Wasmtime labels May 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Subscribe to Label Action

cc @fitzgen

Details This issue or pull request has been labeled: "wasmtime:api", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: wasmtime:ref-types

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Comment on lines 885 to 888
unsafe fn vmctx_gc_heap_data(&self) -> NonNull<u8> {
// The copying collector doesn't currently have vmctx GC heap
// data. Return a dangling pointer.
NonNull::dangling()
let ptr: *const VMCopyingHeapData = &self.vmctx_data;
NonNull::new(ptr as *mut u8).unwrap()
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is preexisting I think for other collectors, but there's 2 primary ways that this method is UB according to Miri:

  1. This is deriving a pointer from &self which is later mutated through in compiled code. That's UB in Rust because by deriving the pointer from &self you're saying that the contents can't be mutated. This is a playground example showing the UB (top right, Tools -> Miri). The fix for this specific issue is to change this method to &mut self and then have the implementation be NonNull::from(&mut self.vmctx_data).cast()
  2. The second problem is that this struture is stored inline with the CopyingHeap, which means that re-acquiring a mutable pointer to CopyingHeap will actually invalidate the previous pointer returned by this method. This is a playground example of the UB here.

Fixing the latter one is not going to be super easy. I can think a few possible ways:

  • Every time we return back to wasm re-acquire the pointer and put it back in the VMContext. Probably too slow to work.
  • Place the VM data behind an indirection (e.g. Box<VMCopyingHeapData> here). The goal is that by using &mut CopyingHeap it doesn't auto-invaildate the vmctx data structures. This alone is not enough, however. Another added fix necessary would be to create the raw pointer at creation of CopyingHeap, and then CopyingHeap itself only accesses the data through that pointer. The end result is sort of like this and models what we do with vm::Instance and its sibling VMContext storage data that comes after it.

I haven't double-checked, but I believe this would all plague the drc/null collectors as well. If you'd like I think it'd be reasonable to fix them all in a subsequent PR.

In terms of detecting all of this, in theory Miri should be able to capture everything here. I think that may mean we're not running much GC-compiled wasms through Miri, which might be a good addition to have? For miri coverage copying this line is probably the easiest. Along those lines rustup run nightly ./ci/miri-wast.sh ./tests/spec_testsuite/struct.wast locally shows a Miri violation which may be related to this, but it's always sort of hard to tell where exactly the miri error comes from due to the mix of pulley and lack of strict provenance...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will handle this in a follow up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, another possible fix is like Store::data_mut where provenance of some Miri-safe pointer is used but the address is used like usual, meaning that the actual machine code doesn't change. This will still require an indirection, however, I think.

Comment thread crates/environ/src/gc/copying.rs Outdated
Comment thread crates/cranelift/src/func_environ/gc/enabled/copying.rs
Comment thread crates/cranelift/src/func_environ/gc/enabled/copying.rs Outdated
Comment thread crates/cranelift/src/func_environ/gc/enabled/copying.rs Outdated
Comment thread crates/cranelift/src/func_environ/gc/enabled/copying.rs Outdated
Comment thread crates/cranelift/src/func_environ/gc/enabled/copying.rs Outdated
@alexcrichton alexcrichton requested review from alexcrichton and removed request for dicej May 8, 2026 15:31
@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented May 8, 2026

FWIW this is a 2% reduction in instructions retired on splay.wasm, but I haven't been able to measure a statistically significant difference in cycles yet. Will try bumping up the number of iterations.

@alexcrichton alexcrichton removed their request for review May 8, 2026 18:47
@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented May 11, 2026

FWIW this is a 2% reduction in instructions retired on splay.wasm, but I haven't been able to measure a statistically significant difference in cycles yet. Will try bumping up the number of iterations.

With 200 iterations:

execution :: cycles :: benchmarks/splay/splay.wasm

  Δ = 84634629.66 ± 43938351.82 (confidence = 99%)

  inline-alloc.so (-Wgc,function-references -Ccollector=copying) is 1.00x to 1.01x faster than main.so (-Wgc,function-references -Ccollector=copying)!

  [13062747333 13278335414.74 13771371538] inline-alloc.so (-Wgc,function-references -Ccollector=copying)
  [13062358191 13362970044.40 13846668642] main.so (-Wgc,function-references -Ccollector=copying)

@fitzgen fitzgen requested a review from alexcrichton May 12, 2026 21:37
Copy link
Copy Markdown
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea what's going on with the splay benchmark? Is it not actually allocating that much? Or is something else so massively dominating that allocation isn't factoring in?

@alexcrichton alexcrichton added this pull request to the merge queue May 12, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 12, 2026
@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented May 12, 2026

Any idea what's going on with the splay benchmark? Is it not actually allocating that much? Or is something else so massively dominating that allocation isn't factoring in?

It has a few constants that control the size of the splay tree to create, the number of splay tree mutations to perform once its created, and one other thing that I forget. When porting it to Wasm, I replaced those constants with configurable parameters read from default.input, and then I also adjusted the default inputs to make executing it take a little longer and match some of our other benchmarks. That last bit involved mostly doing 10x as many (IIRC) mutations to the tree, while keeping it the same size, so this change could have made it less allocation-heavy in an overall relative sense, and exercising GC object accesses more. (IIRC also, trying to make the tree bigger was starting to hit GC OOM).

Also, it is written in an "idiomatic" style with lots of little functions, and we don't do any inlining by default. That could probably change things drastically for it.

@fitzgen fitzgen force-pushed the wasmtime-inline-copying-collector-alloc branch from e862afd to 61bf89c Compare May 12, 2026 23:04
@fitzgen fitzgen enabled auto-merge May 12, 2026 23:04
@fitzgen fitzgen added this pull request to the merge queue May 12, 2026
Merged via the queue into bytecodealliance:main with commit fd79477 May 12, 2026
177 checks passed
@fitzgen fitzgen deleted the wasmtime-inline-copying-collector-alloc branch May 12, 2026 23:53
@fitzgen
Copy link
Copy Markdown
Member Author

fitzgen commented May 13, 2026

Any idea what's going on with the splay benchmark? Is it not actually allocating that much? Or is something else so massively dominating that allocation isn't factoring in?

FWIW, here is a quick perf profile of splay.wasm:

image

Nearly all time is spent inside the collector itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:ref-types Issues related to reference types and GC in Wasmtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants