Skip to content
This repository was archived by the owner on Sep 8, 2025. It is now read-only.

Conversation

@dicej
Copy link
Collaborator

@dicej dicej commented Jul 21, 2025

This makes a several changes to how {Stream,Future}|{Reader,Writer} work to
make them more efficient and, in some ways, more ergonomic:

  • The background tasks have been removed, allowing reads and writes to complete without task context switching. We now only allocate and use oneshot channels lazily when the other end is not yet ready; this improves real world performance benchmarks (e.g. wasi-http request handling) considerably.

  • Instances of {Stream,Future}Reader can now be lifted and lowered directly; no need for Host{Stream,Future} anymore.

  • The type parameter for Stream{Reader,Writer} no longer refers to the buffer type -- just the payload type (i.e. StreamReader<u8> instead of StreamReader<Vec<u8>>), meaning any buffer type may be used for a given read or write operation. This also means the compiler needs help with type inference less often when calling Instance::stream.

  • Instances of {Stream,Future}|{Reader,Writer} now require access to the store in order to be disposed of properly. I've added RAII wrapper structs (WithAccessor[AndValue]) to help with this, but they can only provide a "best-effort" guarantee since using an Accessor from within a Drop::drop implementation won't always work.

  • In order to ensure that resources containing {Stream,Future}|{Reader,Writer} instances are disposed of properly, I've added LinkerInstance::resource_concurrent and have updated wasmtime-wit-bindgen to use it. This gives resource drop functions access to a StoreContextMut via an Accessor, allowing the stream and future handles to be disposed of.

    • In order to make this work, I had to change Accessor::instance from a Instance to an Option<Instance>, which is awkward but temporary since we're planning to remove Accessor::instance entirely once we've moved concurrent state from ComponentInstance to Store.

That problem of disposal is definitely the most awkward part of all this. In
simple cases, it's easy enough to ensure that read and write handles are
disposed of properly, but both wasmtime-wasi and wasmtime-wasi-http have
some pretty complicated functions where handles are passed between tasks and/or
stored inside resources, so it can be tricky to ensure proper disposal on all
code paths. I'm open to ideas for improving this, but I suspect we'll need new
Rust language features (e.g. linear types) to make it truly ergonomic, robust,
and efficient.

While testing the above, I discovered an issue with Instance::poll_until such
that it would prematurely give up and return a "deadlock" trap error, believing
that there was no further work to do, even though the future passed to it was
ready to resolve the next time it was polled. I've fixed this by polling it one
last time and only trapping if it returns pending.

Note that I've moved a few associated functions from ConcurrentState to
Instance (e.g. guest_drop_writable and others) since they now need access to
the store; they're unchanged otherwise. Apologies for the diff noise.

Finally, I've tweaked how wasmtime serve to poll the guest for content before
handing the response to Hyper, which helps performance by ensuring the first
content chunk can be sent with the same TCP packet as the beginning of the
response.

This makes a several changes to how `{Stream,Future}|{Reader,Writer}` work to
make them more efficient and, in some ways, more ergonomic:

- The background tasks have been removed, allowing reads and writes to complete without task context switching.  We now only allocate and use oneshot channels lazily when the other end is not yet ready; this improves real world performance benchmarks (e.g. wasi-http request handling) considerably.

- Instances of `{Stream,Future}Reader` can now be lifted and lowered directly; no need for `Host{Stream,Future}` anymore.

- The type parameter for `Stream{Reader,Writer}` no longer refers to the buffer type -- just the payload type (i.e. `StreamReader<u8>` instead of `StreamReader<Vec<u8>>`), meaning any buffer type may be used for a given read or write operation.  This also means the compiler needs help with type inference less often when calling `Instance::stream`.

- Instances of `{Stream,Future}|{Reader,Writer}` now require access to the store in order to be disposed of properly.  I've added RAII wrapper structs (`WithAccessor[AndValue]`) to help with this, but they can only provide a "best-effort" guarantee since using an `Accessor` from within a `Drop::drop` implementation won't always work.

- In order to ensure that resources containing `{Stream,Future}|{Reader,Writer}` instances are disposed of properly, I've added `LinkerInstance::resource_concurrent` and have updated `wasmtime-wit-bindgen` to use it.  This gives resource drop functions access to a `StoreContextMut` via an `Accessor`, allowing the stream and future handles to be disposed of.
    - In order to make this work, I had to change `Accessor::instance` from a `Instance` to an `Option<Instance>`, which is awkward but temporary since we're planning to remove `Accessor::instance` entirely once we've moved concurrent state from `ComponentInstance` to `Store`.

That problem of disposal is definitely the most awkward part of all this.  In
simple cases, it's easy enough to ensure that read and write handles are
disposed of properly, but both `wasmtime-wasi` and `wasmtime-wasi-http` have
some pretty complicated functions where handles are passed between tasks and/or
stored inside resources, so it can be tricky to ensure proper disposal on all
code paths.  I'm open to ideas for improving this, but I suspect we'll need new
Rust language features (e.g. linear types) to make it truly ergonomic, robust,
and efficient.

While testing the above, I discovered an issue with `Instance::poll_until` such
that it would prematurely give up and return a "deadlock" trap error, believing
that there was no further work to do, even though the future passed to it was
ready to resolve the next time it was polled.  I've fixed this by polling it one
last time and only trapping if it returns pending.

Note that I've moved a few associated functions from `ConcurrentState` to
`Instance` (e.g. `guest_drop_writable` and others) since they now need access to
the store; they're unchanged otherwise.  Apologies for the diff noise.

Finally, I've tweaked how `wasmtime serve` to poll the guest for content before
handing the response to Hyper, which helps performance by ensuring the first
content chunk can be sent with the same TCP packet as the beginning of the
response.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks reasonable to me, but there's some high level points from me:

  • I will want to spend some more time with the drop-related things to think through them. Basically as a foundational abstraction I want to make sure we spend some time thinking through the design tradeoffs and explore possible alternatives.
  • In the limit I still want to get rid of oneshots entirely in futures/streams, even for writes/reads that can't immediately complete. I think this should still be possible but regardless this is a great step in the right direction towards plumbing the async-ness further down and removing more of the CPS-ish way of passing around closures/async/etc.

dicej added 3 commits July 25, 2025 10:56
This is no longer necessary to avoid 40ms of extra response latency for
`wasi:http@0.3.0-draft` components since we now poll the body stream before
handing the response to Hyper.  I still think the Nagle algorithm is a bad
default, but I can make a separate PR to disable it.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Previously, `WithAccessor[AndValue]` made only "best-effort" guarantees about
disposing of handles on `drop` -- i.e. they did nothing if dropped outside the
scope of a `tls::set`.  Now they will panic instead, but that should never
happen since we can guarantee futures closing over `&Accessor` are always
dropped within the scope of a `tls::set`.  Specifically, we ensure that happens
in both `Store::drop` and `Instance::run_concurrent`.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
@dicej
Copy link
Collaborator Author

dicej commented Jul 25, 2025

@alexcrichton I believe I've addressed your feedback so far. Let me know if there's anything else you'd like me to change.

@alexcrichton alexcrichton added this pull request to the merge queue Jul 25, 2025
@alexcrichton
Copy link
Member

Nah let's land here and work through any other specifics during upstreaming

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 25, 2025
@dicej dicej added this pull request to the merge queue Jul 25, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 25, 2025
@dicej dicej enabled auto-merge July 25, 2025 21:39
@dicej dicej added this pull request to the merge queue Jul 25, 2025
Merged via the queue into main with commit 6ded8f9 Jul 25, 2025
45 checks passed
@dicej dicej deleted the dicej/stream-future-refactor branch July 25, 2025 22:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants