Skip to content

Conversation

@rvolosatovs
Copy link
Member

@rvolosatovs rvolosatovs commented Oct 18, 2024

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
@programmerjake
Copy link
Contributor

i don't see a way to access a .so global variable from wasm...

@rvolosatovs
Copy link
Member Author

rvolosatovs commented Oct 18, 2024

i don't see a way to access a .so global variable from wasm...

indeed, I did not add this functionality in the current wasi-dl draft, it's really just an approximation sufficient for a very basic PoC

Does rvolosatovs/wasi-dl#1 address your concern? alloc can be interpreted as any ffi-type or even a function.

Perhaps resource symbol is redundant altogether and lookups should just return alloc resources, which can be interpreted as functions

Edit: added rvolosatovs/wasi-dl@30ea77f
Edit 2: updated PR with b48bdaa

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
@programmerjake
Copy link
Contributor

Does rvolosatovs/wasi-dl#1 address your concern?

yes!

Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
@rvolosatovs
Copy link
Member Author

Did a small update to ensure the symbol lookups are typed b9ae4c9

@sunfishcode
Copy link
Member

Such interface is unsafe and it must be used with extreme care, however that is no different from any other host plugin, which would be loaded via dlopen.

There are two kinds of unsafe relevant here. One is whether the plugin code is unsafe, and I agree that this is basically the same with any host plugin system we'd design here. The other is whether Wasm code using the plugin code is unsafe.

The libffi-style approach in this proposal looks like it means that we'd additionally have to treat the Wasm that calls the code as unsafe by default, and while there are potential ways to make it safe, they aren't described here.

Also, the libffi-style approach in this proposal looks like it would mean that the Wasm would not be portable, in general, because libffi doesn't encapsulate all C ABI details. What is off_t a typedef for? What is the value of ENOENT? And so on.

[wasi-dl]-based approach also provides greater security, since the implementation of [wasi-dl] may restrict the set of libraries allowed to be loaded and potentially define the exact signatures for symbols defined in them.

This proposal does not currently describe how this would work. And, signatures alone would not be sufficient, because libffi-style bindings also include raw pointers.

Perhaps it would be possible to design an interface description language sophisticated enough to describe these interfaces, including signatures, lifetime information, synchronization information, and perhaps also resource lifetimes (eg. open files that need to be explicitly closed and not used thereafter), and perhaps eventually even a way to describe C unions, or some safe variant-like subset of them. If this rfc implies the design of a new interface description language, it'd be good to say more about what that looks like.

As an alternative to exposing wasi-dl interface to (plugin) components, we could use the dynamic libraries themselves as the host plugins. For that we would need to carefully design a set of conventions specific to wasmtime for such plugins to be able to define their exports and expose them to components.

Such an approach would require custom-built dynamic libraries for plugins, if an existing library was desired to be used, an "adapter" library would need to be built, which would in turn dynamically-load that library.

It looks like this proposal would also usually want "adapter" libraries too, or at least adapter layers, because I don't expect we'll want normal Wasm code talking directly to these low-level libffi-style APIs, for ergonomics, language-independence, portability, and potential security reasons. And these adapters are going to be tedious to write and maintain, because they need to be written for each source language that needs them, and they'll have a lot of repetitive low-level code. I imagine we'd pretty quickly find ourselves wanting bindings generators for this task.

And if we're going to design a language-independent sandboxable interface description language with tooling around it for generating bindings, we should think carefully about whether or not we already have one, and what relationships we want 😄.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing up this RFC!

I agree that a plugin system geared towards allowing hosts to define and expose new capabilities to Wasm guests that Wasmtime has no builtin knowledge of is very valuable.

Unfortunately, I think a missing constraint is that we fundamentally cannot trust Wasm guests, so we can't just expose dlopen/dlsym and raw FFI types to them. Therefore, I don't think the solution proposed here is something we can pursue. More details inline below.

That said, I also sketch (very roughly) an alternative approach that should address the same motivations but which avoids giving untrusted Wasm guests raw dlopen powers.

Comment on lines +94 to +96
/// Constructs a function from an opaque `alloc` and a type signature
/// Fails if type of `alloc` is not `ffi-type::primitive(primitive-type::pointer)`
from-alloc: static func(alloc: alloc, args: list<ffi-type>, ret: option<ffi-type>) -> result<function>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this result in memory unsafety if the Wasm (which is untrusted, and potentially malicious) passes the wrong number or type of arguments and returns?

Or is it expected that Wasmtime will somehow dynamically check these calls?

Similar question for declaring FFI struct types and their fields.

}
```

Such interface is *unsafe* and it must be used with extreme care, however that is no different from any other host plugin, which would be loaded via `dlopen`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the answer to the previous question is "yes" then.

I see @sunfishcode's comments now, and I agree with the gist of his points.

There is a difference between whether

  1. the plugin internally is using unsafe but exposing a safe interface, and
  2. the plugin's interface is itself unsafe.

With (1) the (untrusted and potentially malicious) Wasm guest cannot trigger any memory safety, modulo implementation bugs in the plugin itself.

With (2) the (untrusted and potentially malicious) Wasm guest can trivially trigger memory unsafety. That is, (2) is handing security vulnerabilities to Wasm guests by design.

So (2) is a complete non-starter; it is contradictory to Wasmtime's (and the BA's) mission and values.

And -- correct me if I'm wrong! -- this RFC seems to be proposing (2) so, unless I am misunderstanding the proposal, this is not an approach we should consider or pursue any further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more constructive, I would suggest an alternative approach that maintains a safe interface to Wasm, something like:

  • There is some well-known symbol that plugin .sos should export, describing their WIT interface (maybe literally just a static WIT_INTERFACE: &'static str = "..." or alternatively the binary encoding of the same thing).
  • Wasmtime loads a plugin.so and reads its WIT interface
  • Wasmtime dlsyms the functions described by the WIT interface
  • Wasmtime adds functions for that WIT interface to a Linker, these functions
    • translates Wasm / canonical ABI arguments into the equivalent in some sort of native ABI
    • call their corresponding dlsymed functions from plugin.so
    • translate the native ABI's result back into Wasm / canonical ABI

In the above sketch, the plugin.so is trusted, but the Wasm is not. Any unsafety can only come from bugs in the plugin.so (either from its internal implementation or if its functions' types don't match the WIT interface it claims). Notably, unsafety cannot originate from within (untrusted and potentially malicious) Wasm guests, no matter what garbage values they indirectly pass to plugin.so.

The tricky parts here will be:

  • What is the native ABI? Can we reuse the canonical ABI or a variant of it? I could imagine a bindgen-y proc macro that does some variant of the canonical ABI for plugins with statically-known interfaces, but what about dynamic interfaces (i.e. the common case for the wasmtime cli, rather than a wasmtime crate embedding that happens to use plugins of a certain shape)? What can we do to avoid arg/result translation overheads?
  • A plugin.so may want some per-Store state, for example if wasi-sockets was implemented as a plugin, it would want any open sockets to be attached to the Store. How do we let plugin.so create that per-Store state? Where do we keep it? How do we pass it back to plugin.so on each call? How do we let plugin.so destroy it when we drop the store?
  • Finally, it isn't clear to me whether this RFC proposes that plugin.sos are forwards compatible with new wasmtime versions (i.e. new Wasmtime releases are backwards compatible with old plugin.sos) or not. If so, then the ABI concerns described above are doubly important and we need to make sure they remain extensible for future additions and changes, which will involve a lot of subtleties.

@rvolosatovs
Copy link
Member Author

Thanks for the feedback @sunfishcode @fitzgen!

In general I feel that perhaps I misjudged the expected level of detail for RFCs in this repository, this RFC currently is very much a high-level idea/direction, as opposed to directly-implementable design document, which seems to what people are searching for here.

First, let's agree on some terms:

In this RFC by component composition I mostly refer to function-style composition, and not component composition as defined at https://component-model.bytecodealliance.org/creating-and-consuming/composing.html#what-is-composition
For example, wasi-virt can mostly fulfill the composition as would be required here.

More formally, let's assume that components are morphisms (functors) that map a set of interfaces (imports) to another set of interfaces (exports).

Their composition is depicted here: composition, taken directly from Category theory Wikipedia page.

Here's an example in context of this RFC:

// Trusted Wasm targets this world
world plugin {
    // These two interfaces are provided by the host:
    import wasi:sockets/tcp;
    import wasi:dl/dl;

    // These two interfaces are provided to the guest:
    export wasi:sockets/tcp;
    export wasi:keyvalue/store;
}

// Untrusted Wasm targets this world
world guest {
    // These two interfaces are either directly provided by the plugin component or passed through to the host *staticaly* by the composition tool:
    import wasi:sockets/tcp;
    import wasi:keyvalue/store;

    export wasi:http/incoming-handler;

    // NOTE: This import would *not* be satisfied:
    // import wasi:dl/dl;
}

world composed {
    import wasi:sockets/tcp;
    import wasi:dl/dl;
    export wasi:http/incoming-handler;
}

@fitzgen you seem to imply that all Wasm is implicitly untrusted.

I'm not sure I agree with that statement and the assumption I'm operating upon is that whether a trusted piece of code is compiled into a native application/library or a Wasm component should not change the "trustworthiness" of the produced artifact. That's a key assumption on which this RFC is built.

Is there something specific about Wasm components I'm not aware of, that would make them inherently untrusted?

In #39 (comment) you've outlined a way how a plugin could be loaded by Wasmtime:

  • There is some well-known symbol that plugin .sos should export, describing their WIT interface (maybe literally just a static WIT_INTERFACE: &'static str = "..." or alternatively the binary encoding of the same thing).

  • Wasmtime loads a plugin.so and reads its WIT interface

  • Wasmtime dlsyms the functions described by the WIT interface

  • Wasmtime adds functions for that WIT interface to a Linker, these functions

    • translates Wasm / canonical ABI arguments into the equivalent in some sort of native ABI
    • call their corresponding dlsymed functions from plugin.so
    • translate the native ABI's result back into Wasm / canonical ABI

Note, that adding functions to the Linker using dlopen and instantiating the (untrusted) Wasm component using it produces a runtime object, which is effectively the composition as I defined above, except it happens at runtime.

In the context of this RFC, the plugin could operate exactly like you've outlined in #39 (comment), except wasmtime CLI would load plugin.wasm as opposed to plugin.so.

Let's consider an example with a shared library plugin (this is not an API suggestion, just a quick example sketch):

wasmtime serve --plugin plugin.so untrusted.wasm

plugin.so would be operating directly as part of runtime's process with no sandboxing whatsoever, it has full, unconstrained access to the OS and runtime process memory.

An example usage with a Wasm component plugin could look like this:

wasmtime serve --plugin plugin.wasm -P tcp=y -P dl=y untrusted.wasm
  • plugin.wasm is operating in different "trust mode", from untrusted.wasm, but still sandboxed.

  • The CLI user explicitly allows plugin.wasm to use tcp and wasi-dl, it has no access to anything else.

    • wasi-dl access could be scoped, e.g. (again, just a quick sketch):
    wasmtime serve --plugin plugin.wasm -P tcp=y -P dl=libm.so:libm.h -P dl=sqlite3.so:sqlite3.h untrusted.wasm
    

    With an API like this the plugin could only ever load libm.so or sqlite3.so - the associated header files could be used to verify wasi-dl calls and would, given the shared library and associated header file correctness, guarantee memory safety.
    Note, that loading C headers is probably a lot of work and I'm not suggesting doing that, rather just pointing out that there is a way to make such interface safe.

  • untrusted.wasm does not inherit plugin.wasm imports - untrusted.wasm in this scenario only has access to interfaces exported and implemented by plugin.wasm, nothing else.

In both cases, one way or another, wasmtime would need to produce a "runtime composition" of a plugin and the guest component.

Arguably, the Wasm plugin option is safer, since the runtime can control what libraries, symbols and their signatures can the plugin access.

In this RFC I've decided to start with a simple approach and give the wasmtime CLI user more control and produce such composition ahead-of-time, drasticaly reducing the scope for this feature and improving performance.

If (trusted) plugin.wasm (with optional wasi-dl access) was run in a separate sandbox, would that address your concerns @fitzgen?

There are two kinds of unsafe relevant here. One is whether the plugin code is unsafe, and I agree that this is basically the same with any host plugin system we'd design here. The other is whether Wasm code using the plugin code is unsafe.

The libffi-style approach in this proposal looks like it means that we'd additionally have to treat the Wasm that calls the code as unsafe by default, and while there are potential ways to make it safe, they aren't described here.

From perspective of memory safety purely, if wasmtime loaded plugin.so, which exported wasi:keyvalue/store, each wasi:keyvalue/store interface call in the guest would be unsafe.

Whether we trust the plugin code or not, guest code directly or indirectly invoking a symbol loaded from a shared object will always be potentially memory unsafe.

Like I mentioned above, the runtime could limit wasi-dl access and potentially made aware of the symbols exported by the libraries (or even statically link to libraries at compilation time and expose them via wasi-dl abstraction).

Effectively, wasi-dl could be turned into a shared object introspection interface, which would be type-safe and could even verify contract constraints not directly expressable by C type system.

One potential strategy could be using value definitions or just functions (since recursive types are not currently allowed) to either process a C header file ahead-of-time or somehow else (e.g. manually) produce something roughly similar to:

(component
  (import "wasi:dl/ffi" (instance 
     (export "primitive-type" (type $primitive_type (enum
        "c-char"
        "uint64-t"
        ;; etc..
     )))
     ;; etc...
  ))
  (import "wasi:dl/dll" (instance 
     (export "function" (type $function (sub resource)))
     ;; etc...
  ))
  ;; using value definition
  (export "SOMECONST" (value $primitive_type (enum "uint64-t"))
  ;; using a function, returns the C type of the constant
  (export "SOMECONST" (func (result $primitive_type)))

  ;; returns a typed `wasi:dl/dll.function`
  (export "myfunc" (func (result $function)))
)
  • What is the native ABI? Can we reuse the canonical ABI or a variant of it? I could imagine a bindgen-y proc macro that does some variant of the canonical ABI for plugins with statically-known interfaces, but what about dynamic interfaces (i.e. the common case for the wasmtime cli, rather than a wasmtime crate embedding that happens to use plugins of a certain shape)? What can we do to avoid arg/result translation overheads?

  • A plugin.so may want some per-Store state, for example if wasi-sockets was implemented as a plugin, it would want any open sockets to be attached to the Store. How do we let plugin.so create that per-Store state? Where do we keep it? How do we pass it back to plugin.so on each call? How do we let plugin.so destroy it when we drop the store?

  • Finally, it isn't clear to me whether this RFC proposes that plugin.sos are forwards compatible with new wasmtime versions (i.e. new Wasmtime releases are backwards compatible with old plugin.sos) or not. If so, then the ABI concerns described above are doubly important and we need to make sure they remain extensible for future additions and changes, which will involve a lot of subtleties.

If the (trusted) plugin was a Wasm component, there'd be no need for any custom symbols or ABI - answers to most of these questions would be provided directly by the component model.

Perhaps it would be possible to design an interface description language sophisticated enough to describe these interfaces, including signatures, lifetime information, synchronization information, and perhaps also resource lifetimes (eg. open files that need to be explicitly closed and not used thereafter), and perhaps eventually even a way to describe C unions, or some safe variant-like subset of them. If this rfc implies the design of a new interface description language, it'd be good to say more about what that looks like.

I don't think that being able to load every dynamic library in existence should be a goal here, the intention is to have an interface with an overlay with C type system just big enough to be useful, but not more that that. I'd expect complex or very platform-specific to be structured the following way:

  • somelib.so (platform-specific)
  • adapter.so (higher-level abstraction to be consumed by Wasm, optinally statically-linking somelib.so)
  • plugin.wasm (trusted Wasm glue essentially, just a mapping for some world W to wasi-dl calls essentially)
  • untrusted.wasm (untrusted Wasm guest importing subset of W)

It looks like this proposal would also usually want "adapter" libraries too, or at least adapter layers, because I don't expect we'll want normal Wasm code talking directly to these low-level libffi-style APIs, for ergonomics, language-independence, portability, and potential security reasons. And these adapters are going to be tedious to write and maintain, because they need to be written for each source language that needs them, and they'll have a lot of repetitive low-level code. I imagine we'd pretty quickly find ourselves wanting bindings generators for this task.

Right, so plugin.wasm is the "adapter" library here.

In terms of re-exporting WASI, good news is that I've already done this for Rust: https://github.com/wasmCloud/wasi-passthrough/tree/1ade95ee6d2046ffefa5a72731bec22a6d470157/src (roughly based it on wasi-virt).
A Rust component can then propagate the re-exports and augment them by simply importing a crate: https://github.com/wasmCloud/wadge/blob/40aef6248a5823f104336b1f2757f50717f3dae3/tests/components/wasi/src/lib.rs

There's certainly a lot of tooling that would be required here to make this nice. In spirit of this RFC, however, such tooling would be general purpose Wasm component tooling, as opposed to something built specifically for Wasmtime plugins.

If we went the route of Wasmtime doing the "runtime composition" by giving the plugin.wasm it's own sandbox, then perhaps the re-export part is less relevant, since then the guest.wasm would be able to still import directly from the host.

If people insist on the "single-Wasm" and plugin-as-a-shared-object approach, then I'd still suggest relying on WIT and component model ABI as much as possible and perhaps use cabish for value encoding/decoding

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 1, 2024

I think having adapter.so directly provide the wasm component interface rather than having to use an intermediate plugin.wasm is safer, faster and easier to use for the end user.

Plugin.wasm is effectively unsandboxed as any mistake in it's use of wasi-dl would cause UB. It is a lot easier to directly define a safe wasm component interface in adapter.so than to export an unsafe C api and then separately consume this C api in plugin.wasm and hope that you didn't accidentally cause an ABI mismatch (as soon as you use any non-fixed size integer type (or an integer type larger than the register size) or you use a struct type or enum in your C api, it becomes non-trivial to match the ABI unless you are the C compiler that compiled adapter.so. And if adapter.so is written in Rust, avoiding a separate plugin.wasm may enable the plugin writer to entirely avoid unsafe code.

Having the intermediate plugin.wasm also requires you to copy all data twice. Once from adapter.so to plugin.wasm and once from plugin.wasm to the wasm module that uses the plugin. If adapter.so directly provides a wasm component interface, it only needs to be copied once.

And finally it is easier for the end user if only adapter.so exists. This way there can't be a version mismatch between adapter.so and plugin.wasm (which will likely cause UB) and you only need to copy a single file around to use the plugin.

@rvolosatovs
Copy link
Member Author

Plugin.wasm is effectively unsandboxed as any mistake in it's use of wasi-dl would cause UB. It is a lot easier to directly define a safe wasm component interface in adapter.so than to export an unsafe C api and then separately consume this C api in plugin.wasm and hope that you didn't accidentally cause an ABI mismatch (as soon as you use any non-fixed size integer type (or an integer type larger than the register size) or you use a struct type or enum in your C api, it becomes non-trivial to match the ABI unless you are the C compiler that compiled adapter.so

I've outlined an example approach in #39 (comment), which would let prevent UB in using wasi-dl.
All primitive types are read and written via resource methods in wasi-dl https://github.com/rvolosatovs/wasi-dl/blob/6d2000d92d96b0967eb5a7ead314a765b7f596e2/wit/dl.wit#L68-L108, so a size mismatch is not possible for non-fixed size integers - the runtime knows the sizes of C primitives at compile time, and if the component would try to write 16-bit char using a u32, it would get an error from set-u32
The component can query the primitive sizes at runtime using sizeof: https://github.com/rvolosatovs/wasi-dl/blob/6d2000d92d96b0967eb5a7ead314a765b7f596e2/wit/dl.wit#L124

Structs are also fully supported by libffi: https://www.chiark.greenend.org.uk/doc/libffi-dev/html/Size-and-Alignment.html
Components would read and write them using resources https://github.com/rvolosatovs/wasi-dl/blob/6d2000d92d96b0967eb5a7ead314a765b7f596e2/wit/dl.wit#L88-L117, where the runtime takes care of the alignment and size.

Having the intermediate plugin.wasm also requires you to copy all data twice. Once from adapter.so to plugin.wasm and once from plugin.wasm to the wasm module that uses the plugin. If adapter.so directly provides a wasm component interface, it only needs to be copied once.

A plugin.wasm would directly read data through the pointer from the shared object. That data would then need to be copied (assuming shared refs are not allowed) into the guest component's memory.

With a plugin.so, it depends:
If we want plugin.so to directly write into runtime's memory, the only single-copy approach I see is the following:

  • Runtime gives a pointer into component's memory space to the plugin
  • Plugin writes through the pointer

Otherwise, we'd still need two copies

@rvolosatovs
Copy link
Member Author

Writing an RFC for dlopen-based plugins was never my intention, I've originally been working on an RFC for RPC-based plugins only, but after gathering some internal feedback and building a small PoC, decided to pivot to try and produce a unified plugin interface (the Wasm-component based one), which would cover all use cases. I personally do not have a use case for the dlopen-based plugins - RPC-based plugins being the only use case I'm after. Basing RPC-based plugins on shared libraries is certainly a non-starter for my use case.

Given that it does not appear that Wasm-based Wasmtime plugins is something people are interested in at this time, I'll take a step back and just go ahead and close this PR, instead replacing it by my original proposal: #40

@rvolosatovs rvolosatovs closed this Nov 1, 2024
@rvolosatovs rvolosatovs deleted the wasmtime-plugins branch November 1, 2024 13:54
@bjorn3
Copy link
Contributor

bjorn3 commented Nov 1, 2024

I've outlined an example approach in #39 (comment), which would let prevent UB in using wasi-dl.
All primitive types are read and written via resource methods in wasi-dl https://github.com/rvolosatovs/wasi-dl/blob/6d2000d92d96b0967eb5a7ead314a765b7f596e2/wit/dl.wit#L68-L108, so a size mismatch is not possible for non-fixed size integers - the runtime knows the sizes of C primitives at compile time, and if the component would try to write 16-bit char using a u32, it would get an error from set-u32
The component can query the primitive sizes at runtime using sizeof: https://github.com/rvolosatovs/wasi-dl/blob/6d2000d92d96b0967eb5a7ead314a765b7f596e2/wit/dl.wit#L124

How does Wasmtime know what type signature that adapter.so needs?

Structs are also fully supported by libffi: https://www.chiark.greenend.org.uk/doc/libffi-dev/html/Size-and-Alignment.html

There are edge cases where even two C compilers for the same platform disagree on the right ABI. Libffi can not know which ABI to use in those cases.


Given that it does not appear that Wasm-based Wasmtime plugins is something people are interested in at this time, I'll take a step back and just go ahead and close this PR, instead replacing it by my original proposal: #40

I personally would still love to see dylib based plugins that directly interface with wasm interface types, but RPC based plugins are also nice. While they would almost certainly be a bit slower, they would be easier to support for other wasm engines that can't support dlopen and would be much easier to sandbox at an OS level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants