-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Merge a non-functional Wizer into Wasmtime #11815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge a non-functional Wizer into Wasmtime #11815
Conversation
This commit updates the wasmtime and wasmtime-wasi crates to 0.22 and handles the API changes. Signed-off-by: Radu M <root@radu.sh>
Update Wasmtime to 0.22
Check rustfmt in CI
When initializing a module, it is sometimes useful to rename a function
after the module has been modified. For example, we may wish to
substitute an alternate entry point (for WASI, this is `_start()`)
since we have already performed some initialization that the normal
entry point would do.
This PR adds a `-r` option to allow such renaming. For example,
$ wizer -r _start=my_new_start -o out.wasm in.wasm
will rename the `my_new_start` export in `in.wasm` to `_start` in
`out.wasm`, replacing the old `_start` export. Multiple renamings are
supported at once.
Previously, it was difficult to properly use Wizer with C++ programs that have static constructors. These constructors are normally run before `main()` is invoked by some libc/linker plumbing at the usual entry point. This causes a set of undesirable problems: - In the Wizer initialization entry point, our globals' constructors have not yet run. - If we manually invoke the constructors to resolve the first issue, then they will be *re*-invoked when the program's main entry point is run on the initialized module. This may cause surprising or inconsistent results. In order to properly handle this, we add a `wizer.h` header with a convenient macro that allows specifying a user initialization function, and adds an exported top-level Wizer initialization entry point and an exported replacement main entry point. The generated initialization entry point invokes any global constructors (using the function `__wasm_call_ctors()` which is generated by `wasm-ld`) then invokes the user's initialization function. The generated main entry point is meant to replace `_start` and performs the same work, *except* that it does not re-invoke constructors. Instead, it immediately invokes `main()` (because everything has already been initialized) and then calls destructors and exits. Taken together, the two entry points match the code that is present in `_start` in an ordinary Wasm binary compiled by wasi-sdk. This approach should be relatively stable, as long as the symbols `__wasm_call_ctors()`, `__wasm_call_dtors()`, and `__original_main()` are not renamed in the WASI SDK (this seems unlikely).
Add option to rename functions, and add C++ example using function renaming to handle ctors properly.
Keeps an alias around so the old flag will keep working. Also provides a "value name" so that the generated help text shows "<dst=src>" instead of "<func_renames>".
Small tweaks and dep updates
This makes it so that repeated wizenings of the same Wasm module (e.g. with different WASI inputs) doesn't require re-compiling the Wasm to native code every time. Fixes #9
Enable Wasmtime's code cache
Add knobs for WASI inheriting stdio and env vars
Although we are implicitly recording only the state that is different from the default (e.g. non-zero regions of memory) we aren't actually *computing* the difference between anything. The word "diff" makes me think of things like `git diff` that do edit distance-style computations, which we definitely aren't doing. I think "snapshot" more correctly reflects the implementation.
Add the ability to preopen dirs during WASI initialization
Add C++ example to Readme
This commit adds support for 99% of module linking to Wizer. This allows modules
within the "bundle" to import/export memories, globals, tables, and instances
from/to each other, but the root module still cannot import any external
state. The one big feature from module linking that isn't supported yet is
importing and exporting modules, rather than instances. More on this restriction
later.
Module linking breaks Wizer's previous assumption that a module is instantiated
and initialized exactly once (or, at least, that if it were instantiated and
initialized multiple times you would get the same result every time due to the
restrictions on imports and what can be used during initialization). Module
linking allows the same module to be instantiated multiple times and have a
different set of functions called on each instance during initialization. This
can lead to distinct initialized states for each instantiation. Here is a simple
example:
```wat
(module
(module $A
(global $g (mut i32) (i32.const 0))
(func (export "f")
global.get $g
i32.const 1
i32.add
global.set $g))
(instance $a1 (instantiate $A))
(instance $a2 (instantiate $A))
(func (export "wizer.initialize")
call (func $a1 "f")
call (func $a2 "f")
call (func $a2 "f")))
```
After initialization, the `$a1` instance has had its `f` function export called
once and therefore its global's value is 1, while `$a2` has had its `f` function
export called twice and therefore its global's value is 2. Because each
instantiation has its own initialized state, we need to emit a pre-initialized
module for _each_ instantiation in the bundle.
At the same time, however, we do _not_ want to duplicate the `f` function across
each pre-initialized module! The code section is generally the largest section
in a Wasm module, and if we naively duplicated the code section for each
instantiation, we could get exponential code size blow ups. (For example,
consider a module linking bundle that contains a leaf module named `$Module1`
that is instantiated twice in another module called `$Module2`, which is in turn
instantiated twice in `$Module4`, which is in turn instantiated twice in
`$Module8`, etc... If the last module in this chain is instantiated `n` times,
then `$Module1` is instantiated `2^n` times, and if we duplicate the code
section for each instantiation we've got `2^n` copies of `$Module1`'s code
section.)
The solution is to split out a "code module" containing only the static sections
from the original module. All the state in the module (memories, globals, and
nested instantiations) are imported via a `__wizer_state` instance import. Then,
each instantiation defines its own "state module" which contains the specific
initialized state for that instance, and which is instantiated exactly once. To
create the initialized version of the original instantiation, we join the state
with the code by instantiating the code module and passing in this
instantiation's state instance.
Let's return to our original example from above. Here is the input Wasm again:
```wat
(module
(module $A
(global $g (mut i32) (i32.const 0))
(func (export "f")
global.get $g
i32.const 1
i32.add
global.set $g))
(instance $a1 (instantiate $A))
(instance $a2 (instantiate $A))
(func (export "wizer.initialize")
call (func $a1 "f")
call (func $a2 "f")
call (func $a2 "f")))
```
And this is roughly what it looks like after wizening:
```wat
(module
;; The code module for $A.
(module $A_code
;; Note that the code module imports state, rather than defining it.
(import "__wizer_state"
(instance (export "__wizer_global_0" (global $g (mut i32)))))
(func (export "f")
global.get $g
i32.const 1
i32.add
global.set $g))
;; State module for $a1. Global is initialized to 1 since $a1's `f`
;; function was called once in the initialization.
(module $a1_state_module
(global (export "__wizer_global_0" (mut i32) (i32.const 1))))
;; The state instance for $a1.
(instance $a1_state_instance (instantiate $a1_state_module))
;; Finally, we join the code+state together to create $a1.
(instance $a1 (instantiate $A_code
(import "__wizer_state" $a1_state_instance)))
;; State module for $a2. Global is initialized to 2 since $a2's `f`
;; function was called twice in the initialization.
(module $a2_state_module
(global (export "__wizer_global_0" (mut i32) (i32.const 2))))
;; The state instance for $a2.
(instance $a2_state_instance (instantiate $a2_state_module))
;; Finally, we join the code+state together to create $a2.
(instance $a2 (instantiate $A_code
(import "__wizer_state" $a2_state_instance)))
;; Initialization function removed.
)
```
Using this code splitting technique allows us to avoid unnecessary duplication
for each instantiation. The only bits that are duplicated are the things that
are inherently different across instantiations: the instantiation's internal
state.
The reason we don't allow importing and exporting modules within the bundle is
that two different modules that have the same module type signature could have
two different sets of internal, not-exported state. This throws a wrench in the
instrumentation pass, which needs to force export all internal state, and in
this case could make the modules have incompatible type signatures, so they
couldn't both be imported by the same module anymore. That, in turn, would force
us to duplicate and specialize modules that import other modules for each
instantiation. This is do-able, but requires significant additional engineering
work. Therefore, this feature is left for some time in the future.
As a final note, adding support for module linking required pretty much a full
rewrite of Wizer. It no longer just parses the Wasm section by section, tweaking
it in place. Instead, there is an initial "parse" phase that creates a
`ModuleInfo` tree that serves as an AST. Our subsequent passes -- the
instrumentation and rewriting passes -- process the `ModuleInfo` tree rather
than reparsing the input Wasm and tweaking it directly.
* Fix Intel Mac CI builds.
We build both x86-64 and aarch64 ("Intel Mac" and "Apple Silicon Mac")
binaries for wizer in CI on the `macos-latest` CI runner. Historically,
this runner was an x86-64 machine, so while we could do a direct compile
for x86-64 binaries, we added a target override for `aarch64-darwin` for
the aarch64 builds to force a cross-compile.
When GitHub switched macOS CI runners to aarch64 (ARM64) machines
somewhat recently, the `macos-latest` runner image began producing
aarch64 binaries by default, and the target override for
cross-compilation became meaningless (redundant). As a result, *both* of
our x86-64 and aarch64 macOS binary artifacts contained aarch64
binaries. This is a problem for anyone running an Intel Mac.
This PR switches the CI config to specify a cross-compilation of x86-64
binaries, so each target builds its proper architecture.
* Use actions/upload-artifact v4
* Use v4 of actions/download-artifact too
* add option for reference_types only enables the wasmtime feature, which allows initializing Modules that were compiled with newer llvm versions but don't actually use reference_types Disabled by default * Update src/lib.rs Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * reject reference-types related instructions * add setter for wasm_reference_types * bail instead of unreachable * tests for reference-types still rejecting tyble modifying instructions * better error messages * test indirect call with reference_types enabled * same docstring for library and cli * (hopefully) actually use the new call_indirect --------- Co-authored-by: Gentle <ramon.klass@gmail.com> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
…liance#129) * enable reference types and bulk memory support by default * update tests --------- Co-authored-by: Gentle <ramon.klass@gmail.com>
* chore(deps): update wasmtime & upstream deps This commit updates wasmtime along with other deps: - wasmtime: v31 -> v36.0.2 - wasmparser et al: *.228 -> *.238 * fix(deps): use major version of wasmtime in dep Co-authored-by: Till Schneidereit <till@tillschneidereit.net> --------- Co-authored-by: Till Schneidereit <till@tillschneidereit.net>
6725c52 to
40de2f7
Compare
|
Procedural idea: to keep history linear and thus make Wasmtime bisects less confusing, would it make sense to graft the initial commit of Wizer on top of Wasmtime to |
|
(I believe this could be done with |
|
(Happy to talk about this tomorrow in the Wasmtime meeting, too; the downside is that by rewriting commits it does change all the hashes) |
|
Ok I've tested what happens with and then I ran this bisect session: Basically looks like Overall I'm also more comfortable not rewriting commits/dates/authors with |
…into wizer-inert
prtest:full
9a8a760 to
19c3380
Compare
This is a history-preserving version of #11805 without any Wasmtime-specific changes. The goal here is to merge everything into this repo while only worrying about preserving the history and then a follow-up PR would actually integrate Wizer with Wasmtime.
This history was procedurally created with by running these commands in a Wizer clone:
and then running these commands in Wasmtime
Notably the big binary blobs are removed from the history to avoid pulling in too much new size here (
uap_bench.wizer.wasmwas 35M+).A final commit is then handwritten which excludes
crates/wizerfrom the overall workspace and that's intended to be the starting point to rebase #11805 onto after merging.Procedurally I do not plan to use the merge queue for this PR. Instead I plan on waiting for CI to fully pass on this PR and then I'll push this directly to
mainafter toggling a switch in the settings for "allow admins to bypass restrictions". I'll then re-enable that switch after I do the push.