Skip to content

Conversation

@alexcrichton
Copy link
Member

@alexcrichton alexcrichton commented Oct 8, 2025

This is a history-preserving version of #11805 without any Wasmtime-specific changes. The goal here is to merge everything into this repo while only worrying about preserving the history and then a follow-up PR would actually integrate Wizer with Wasmtime.

This history was procedurally created with by running these commands in a Wizer clone:

python3 ~/git-filter-repo --to-subdirectory-filter crates/wizer
python3 ~/git-filter-repo --invert-paths --path crates/wizer/benches/uap_bench.wizer.wasm
python3 ~/git-filter-repo --invert-paths --path crates/wizer/benches/regex_bench.wizer.wasm
python3 ~/git-filter-repo --invert-paths --path crates/wizer/benches/uap_bench.control.wasm
python3 ~/git-filter-repo --invert-paths --path crates/wizer/benches/regex_bench.control.wasm
python3 ~/git-filter-repo --invert-paths --path crates/wizer/tests/regex_test.wasm
python3 ~/git-filter-repo --invert-paths --path crates/wizer/.github/actions/github-release
git push git@github.com:alexcrichton/wizer main:wasmtime-merge -f

and then running these commands in Wasmtime

git fetch https://github.com/alexcrichton/wizer wasmtime-merge
git merge FETCH_HEAD --allow-unrelated-histories

Notably the big binary blobs are removed from the history to avoid pulling in too much new size here (uap_bench.wizer.wasm was 35M+).

A final commit is then handwritten which excludes crates/wizer from the overall workspace and that's intended to be the starting point to rebase #11805 onto after merging.

Procedurally I do not plan to use the merge queue for this PR. Instead I plan on waiting for CI to fully pass on this PR and then I'll push this directly to main after toggling a switch in the settings for "allow admins to bypass restrictions". I'll then re-enable that switch after I do the push.

Radu M and others added 30 commits January 13, 2021 19:19
This commit updates the wasmtime and wasmtime-wasi crates to 0.22 and
handles the API changes.

Signed-off-by: Radu M <root@radu.sh>
When initializing a module, it is sometimes useful to rename a function
after the module has been modified. For example, we may wish to
substitute an alternate entry point (for WASI, this is `_start()`)
since we have already performed some initialization that the normal
entry point would do.

This PR adds a `-r` option to allow such renaming. For example,

    $ wizer -r _start=my_new_start -o out.wasm in.wasm

will rename the `my_new_start` export in `in.wasm` to `_start` in
`out.wasm`, replacing the old `_start` export. Multiple renamings are
supported at once.
Previously, it was difficult to properly use Wizer with C++ programs
that have static constructors. These constructors are normally run
before `main()` is invoked by some libc/linker plumbing at the usual
entry point. This causes a set of undesirable problems:

- In the Wizer initialization entry point, our globals' constructors
  have not yet run.

- If we manually invoke the constructors to resolve the first issue,
  then they will be *re*-invoked when the program's main entry point is
  run on the initialized module. This may cause surprising or
  inconsistent results.

In order to properly handle this, we add a `wizer.h` header with a
convenient macro that allows specifying a user initialization function,
and adds an exported top-level Wizer initialization entry point and an
exported replacement main entry point.

The generated initialization entry point invokes any global constructors
(using the function `__wasm_call_ctors()` which is generated by
`wasm-ld`) then invokes the user's initialization function.

The generated main entry point is meant to replace `_start` and performs
the same work, *except* that it does not re-invoke constructors.
Instead, it immediately invokes `main()` (because everything has already
been initialized) and then calls destructors and exits.

Taken together, the two entry points match the code that is present in
`_start` in an ordinary Wasm binary compiled by wasi-sdk. This approach
should be relatively stable, as long as the symbols
`__wasm_call_ctors()`, `__wasm_call_dtors()`, and `__original_main()`
are not renamed in the WASI SDK (this seems unlikely).
Add option to rename functions, and add C++ example using function renaming to handle ctors properly.
Keeps an alias around so the old flag will keep working.

Also provides a "value name" so that the generated help text shows "<dst=src>"
instead of "<func_renames>".
This makes it so that repeated wizenings of the same Wasm module (e.g. with
different WASI inputs) doesn't require re-compiling the Wasm to native code
every time.

Fixes #9
Enable Wasmtime's code cache
Add knobs for WASI inheriting stdio and env vars
Although we are implicitly recording only the state that is different from the
default (e.g. non-zero regions of memory) we aren't actually *computing* the
difference between anything. The word "diff" makes me think of things like `git
diff` that do edit distance-style computations, which we definitely aren't
doing. I think "snapshot" more correctly reflects the implementation.
Add the ability to preopen dirs during WASI initialization
Add C++ example to Readme
This commit adds support for 99% of module linking to Wizer. This allows modules
within the "bundle" to import/export memories, globals, tables, and instances
from/to each other, but the root module still cannot import any external
state. The one big feature from module linking that isn't supported yet is
importing and exporting modules, rather than instances. More on this restriction
later.

Module linking breaks Wizer's previous assumption that a module is instantiated
and initialized exactly once (or, at least, that if it were instantiated and
initialized multiple times you would get the same result every time due to the
restrictions on imports and what can be used during initialization). Module
linking allows the same module to be instantiated multiple times and have a
different set of functions called on each instance during initialization. This
can lead to distinct initialized states for each instantiation. Here is a simple
example:

```wat
(module
  (module $A
    (global $g (mut i32) (i32.const 0))
    (func (export "f")
      global.get $g
      i32.const 1
      i32.add
      global.set $g))
  (instance $a1 (instantiate $A))
  (instance $a2 (instantiate $A))
  (func (export "wizer.initialize")
    call (func $a1 "f")
    call (func $a2 "f")
    call (func $a2 "f")))
```

After initialization, the `$a1` instance has had its `f` function export called
once and therefore its global's value is 1, while `$a2` has had its `f` function
export called twice and therefore its global's value is 2. Because each
instantiation has its own initialized state, we need to emit a pre-initialized
module for _each_ instantiation in the bundle.

At the same time, however, we do _not_ want to duplicate the `f` function across
each pre-initialized module! The code section is generally the largest section
in a Wasm module, and if we naively duplicated the code section for each
instantiation, we could get exponential code size blow ups. (For example,
consider a module linking bundle that contains a leaf module named `$Module1`
that is instantiated twice in another module called `$Module2`, which is in turn
instantiated twice in `$Module4`, which is in turn instantiated twice in
`$Module8`, etc... If the last module in this chain is instantiated `n` times,
then `$Module1` is instantiated `2^n` times, and if we duplicate the code
section for each instantiation we've got `2^n` copies of `$Module1`'s code
section.)

The solution is to split out a "code module" containing only the static sections
from the original module. All the state in the module (memories, globals, and
nested instantiations) are imported via a `__wizer_state` instance import. Then,
each instantiation defines its own "state module" which contains the specific
initialized state for that instance, and which is instantiated exactly once. To
create the initialized version of the original instantiation, we join the state
with the code by instantiating the code module and passing in this
instantiation's state instance.

Let's return to our original example from above. Here is the input Wasm again:

```wat
(module
  (module $A
    (global $g (mut i32) (i32.const 0))
    (func (export "f")
      global.get $g
      i32.const 1
      i32.add
      global.set $g))
  (instance $a1 (instantiate $A))
  (instance $a2 (instantiate $A))
  (func (export "wizer.initialize")
    call (func $a1 "f")
    call (func $a2 "f")
    call (func $a2 "f")))
```

And this is roughly what it looks like after wizening:

```wat
(module
  ;; The code module for $A.
  (module $A_code
    ;; Note that the code module imports state, rather than defining it.
    (import "__wizer_state"
      (instance (export "__wizer_global_0" (global $g (mut i32)))))
    (func (export "f")
      global.get $g
      i32.const 1
      i32.add
      global.set $g))

  ;; State module for $a1. Global is initialized to 1 since $a1's `f`
  ;; function was called once in the initialization.
  (module $a1_state_module
    (global (export "__wizer_global_0" (mut i32) (i32.const 1))))

  ;; The state instance for $a1.
  (instance $a1_state_instance (instantiate $a1_state_module))

  ;; Finally, we join the code+state together to create $a1.
  (instance $a1 (instantiate $A_code
                  (import "__wizer_state" $a1_state_instance)))

  ;; State module for $a2. Global is initialized to 2 since $a2's `f`
  ;; function was called twice in the initialization.
  (module $a2_state_module
    (global (export "__wizer_global_0" (mut i32) (i32.const 2))))

  ;; The state instance for $a2.
  (instance $a2_state_instance (instantiate $a2_state_module))

  ;; Finally, we join the code+state together to create $a2.
  (instance $a2 (instantiate $A_code
                  (import "__wizer_state" $a2_state_instance)))

  ;; Initialization function removed.
)
```

Using this code splitting technique allows us to avoid unnecessary duplication
for each instantiation. The only bits that are duplicated are the things that
are inherently different across instantiations: the instantiation's internal
state.

The reason we don't allow importing and exporting modules within the bundle is
that two different modules that have the same module type signature could have
two different sets of internal, not-exported state. This throws a wrench in the
instrumentation pass, which needs to force export all internal state, and in
this case could make the modules have incompatible type signatures, so they
couldn't both be imported by the same module anymore. That, in turn, would force
us to duplicate and specialize modules that import other modules for each
instantiation. This is do-able, but requires significant additional engineering
work. Therefore, this feature is left for some time in the future.

As a final note, adding support for module linking required pretty much a full
rewrite of Wizer. It no longer just parses the Wasm section by section, tweaking
it in place. Instead, there is an initial "parse" phase that creates a
`ModuleInfo` tree that serves as an AST. Our subsequent passes -- the
instrumentation and rewriting passes -- process the `ModuleInfo` tree rather
than reparsing the input Wasm and tweaking it directly.
Guy Bedford and others added 13 commits August 20, 2024 17:18
* Fix Intel Mac CI builds.

We build both x86-64 and aarch64 ("Intel Mac" and "Apple Silicon Mac")
binaries for wizer in CI on the `macos-latest` CI runner. Historically,
this runner was an x86-64 machine, so while we could do a direct compile
for x86-64 binaries, we added a target override for `aarch64-darwin` for
the aarch64 builds to force a cross-compile.

When GitHub switched macOS CI runners to aarch64 (ARM64) machines
somewhat recently, the `macos-latest` runner image began producing
aarch64 binaries by default, and the target override for
cross-compilation became meaningless (redundant). As a result, *both* of
our x86-64 and aarch64 macOS binary artifacts contained aarch64
binaries. This is a problem for anyone running an Intel Mac.

This PR switches the CI config to specify a cross-compilation of x86-64
binaries, so each target builds its proper architecture.

* Use actions/upload-artifact v4

* Use v4 of actions/download-artifact too
* add option for reference_types

only enables the wasmtime feature, which allows initializing Modules
that were compiled with newer llvm versions but don't actually use
reference_types

Disabled by default

* Update src/lib.rs

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* reject reference-types related instructions

* add setter for wasm_reference_types

* bail instead of unreachable

* tests for reference-types still rejecting tyble modifying instructions

* better error messages

* test indirect call with reference_types enabled

* same docstring for library and cli

* (hopefully) actually use the new call_indirect

---------

Co-authored-by: Gentle <ramon.klass@gmail.com>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
…liance#129)

* enable reference types and bulk memory support by default

* update tests

---------

Co-authored-by: Gentle <ramon.klass@gmail.com>
* chore(deps): update wasmtime & upstream deps

This commit updates wasmtime along with other deps:

- wasmtime: v31 -> v36.0.2
- wasmparser et al: *.228 -> *.238

* fix(deps): use major version of wasmtime in dep

Co-authored-by: Till Schneidereit <till@tillschneidereit.net>

---------

Co-authored-by: Till Schneidereit <till@tillschneidereit.net>
@alexcrichton alexcrichton requested review from a team as code owners October 8, 2025 15:18
@alexcrichton alexcrichton requested review from dicej and removed request for a team October 8, 2025 15:18
@cfallin
Copy link
Member

cfallin commented Oct 8, 2025

Procedural idea: to keep history linear and thus make Wasmtime bisects less confusing, would it make sense to graft the initial commit of Wizer on top of Wasmtime main? Then the history goes from

(wizer initial) ------------------\
                                   >--(merge)----------
(Wasmtime initial)----------------/

to

                                               /--(wizer initial) ------------------\
                                              /                                      >--(merge)----------
(Wasmtime initial)------------------------------------------------------------------/

@cfallin
Copy link
Member

cfallin commented Oct 8, 2025

(I believe this could be done with git rebase --onto main when on the Wizer branch, before the merge commit)

@cfallin
Copy link
Member

cfallin commented Oct 8, 2025

(Happy to talk about this tomorrow in the Wasmtime meeting, too; the downside is that by rewriting commits it does change all the hashes)

@alexcrichton
Copy link
Member Author

Ok I've tested what happens with git bisect with unrelated histories. I created a repo like this:

* commit 953fddf46cba6f53a67cc644d18d37dfc86b3b68 (HEAD -> main)
| Author: Alex Crichton <alex@alexcrichton.com>
| Date:   Thu Oct 9 12:51:18 2025 -0700
|
|     main3
|
*   commit 5daf51f205fbb562ac22bb084b1d9d6a620fc9ed
|\  Merge: a9307e8 709165d
| | Author: Alex Crichton <alex@alexcrichton.com>
| | Date:   Thu Oct 9 12:51:12 2025 -0700
| |
| |     Merge branch 'adjacent'
| |
| * commit 709165d603c4f09fce67027e358ba1604927a0c2 (adjacent)
|   Author: Alex Crichton <alex@alexcrichton.com>
|   Date:   Thu Oct 9 12:49:14 2025 -0700
|
|       adjacent1
|
* commit a9307e87fb2eee68a5515099e8fbf2bd5feee5d6
| Author: Alex Crichton <alex@alexcrichton.com>
| Date:   Thu Oct 9 12:51:10 2025 -0700
|
|     main2
|
* commit 97c3f6070514e30a101e60ed25a0ad6201958b6a
  Author: Alex Crichton <alex@alexcrichton.com>
  Date:   Thu Oct 9 12:49:14 2025 -0700

      main1

and then I ran this bisect session:

$ git bisect start
status: waiting for both good and bad commits
$ git bisect good 97c3f6070514e30a101e60ed25a0ad6201958b6a
status: waiting for bad commit, 1 good commit known
$ git bisect bad 953fddf46cba6f53a67cc644d18d37dfc86b3b68
Bisecting: 2 revisions left to test after this (roughly 1 step)
[709165d603c4f09fce67027e358ba1604927a0c2] adjacent1
$ git bisect skip
Bisecting: 0 revisions left to test after this (roughly 1 step)
[5daf51f205fbb562ac22bb084b1d9d6a620fc9ed] Merge branch 'adjacent'
$ git bisect bad
Bisecting: 1 revision left to test after this (roughly 1 step)
[a9307e87fb2eee68a5515099e8fbf2bd5feee5d6] main2
$ git bisect bad
a9307e87fb2eee68a5515099e8fbf2bd5feee5d6 is the first bad commit
commit a9307e87fb2eee68a5515099e8fbf2bd5feee5d6
Author: Alex Crichton <alex@alexcrichton.com>
Date:   Thu Oct 9 12:51:10 2025 -0700

    main2

 a | 2 ++
 1 file changed, 2 insertions(+)

Basically looks like git bisect figured out how to switch between trees alright.

Overall I'm also more comfortable not rewriting commits/dates/authors with git rebase, so I'm going to stick to this branch's merge strategy of two unrelated histories with one merge commit to join them together

@alexcrichton alexcrichton merged commit 19c3380 into bytecodealliance:main Oct 9, 2025
40 checks passed
@alexcrichton alexcrichton deleted the wizer-inert branch October 9, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.