Skip to content

perf: make BufList::remaining() O(1) with a cached byte count#4031

Open
DioCrafts wants to merge 2 commits intohyperium:masterfrom
DioCrafts:perf/buflist-remaining-o1
Open

perf: make BufList::remaining() O(1) with a cached byte count#4031
DioCrafts wants to merge 2 commits intohyperium:masterfrom
DioCrafts:perf/buflist-remaining-o1

Conversation

@DioCrafts
Copy link

Description:

BufList::remaining() iterated over all queued buffers to sum their lengths. For write-heavy workloads with many small buffers queued, this O(n) scan ran on every poll cycle.

This PR adds a remaining: usize field that tracks the total byte count incrementally. push() adds, advance() and copy_to_bytes() subtract. remaining() returns the cached value directly.

What changed

  • buf.rs: added remaining field to BufList<T>
  • Updated push(), advance(), copy_to_bytes() to maintain the counter
  • remaining() now returns the field instead of iterating

No public API changes. All tests pass, including the existing BufList unit tests.

Add a  field to BufList that tracks the total number
of bytes across all buffers. This avoids iterating the entire
VecDeque on every call to remaining(), which is invoked from
hot paths like poll_flush, can_buffer, and advance.

Previously, remaining() was O(n) where n is the number of
buffers in the queue (up to 16 in Queue write strategy).
Now it is O(1) — a simple field read.
@DioCrafts
Copy link
Author

@paolobarbolini You're right, added an assert guard. Appreciate the review! 🙏

@seanmonstar
Copy link
Member

Do you have benchmarks or something that is able to measure the change of this with more and less write buffers?

@DioCrafts
Copy link
Author

@seanmonstar

Not yet. I can write a micro-benchmark that pushes N small buffers and calls remaining() in a loop to show the O(n) vs O(1) difference. For typical connections with few queued buffers the gain is small, but it becomes visible with many small writes.

Do you want me to add the benchmark and post the numbers ??

Kind regards

@seanmonstar
Copy link
Member

I was interested in the motivation, and to be sure we measure that things were improved. Was this showing high in profiles in a server you had, or something?

@DioCrafts
Copy link
Author

DioCrafts commented Mar 3, 2026

@seanmonstar exactly, in my team we are running several critical connections with hyper in different applications, reducing latency is totally important for us. Perfect, I already have the benchmark.

BufList::remaining() O(1) Optimization — Benchmark Results

Benchmarks run with cargo +nightly bench --features "nightly,full" --lib -- buflist on both the baseline (iterative O(n)) and optimized (cached O(1)) implementations.

remaining() call only

Buffers Before (O(n)) After (O(1)) Speedup
1 0.99 ns/iter 0.32 ns/iter 3x
4 1.14 ns/iter 0.27 ns/iter 4x
16 2.48 ns/iter 0.40 ns/iter 6x
128 20.54 ns/iter 0.35 ns/iter 59x
1024 302.59 ns/iter 0.33 ns/iter 917x

remaining() is now constant-time (~0.3 ns) regardless of buffer count.

push() + remaining() cycle (simulates real write loop)

Buffers Before After Improvement
16 148.50 ns/iter 118.62 ns/iter ~20%
128 2,032.20 ns/iter 468.55 ns/iter ~77%

With hyper's current MAX_BUF_LIST_BUFFERS = 16, the push+remaining cycle sees a ~20% speedup, growing to ~77% at higher buffer counts.

Raw output

Before (O(n) iteration)

test common::buf::bench::buflist_remaining_1_buf        ... bench:           0.99 ns/iter (+/- 0.15)
test common::buf::bench::buflist_remaining_4_bufs       ... bench:           1.14 ns/iter (+/- 0.19)
test common::buf::bench::buflist_remaining_16_bufs      ... bench:           2.48 ns/iter (+/- 0.35)
test common::buf::bench::buflist_remaining_128_bufs     ... bench:          20.54 ns/iter (+/- 2.53)
test common::buf::bench::buflist_remaining_1024_bufs    ... bench:         302.59 ns/iter (+/- 19.95)
test common::buf::bench::buflist_push_and_remaining_16  ... bench:         148.50 ns/iter (+/- 25.87)
test common::buf::bench::buflist_push_and_remaining_128 ... bench:       2,032.20 ns/iter (+/- 285.50)

After (O(1) cached field)

test common::buf::bench::buflist_remaining_1_buf        ... bench:           0.32 ns/iter (+/- 0.05)
test common::buf::bench::buflist_remaining_4_bufs       ... bench:           0.27 ns/iter (+/- 0.05)
test common::buf::bench::buflist_remaining_16_bufs      ... bench:           0.40 ns/iter (+/- 0.15)
test common::buf::bench::buflist_remaining_128_bufs     ... bench:           0.35 ns/iter (+/- 0.08)
test common::buf::bench::buflist_remaining_1024_bufs    ... bench:           0.33 ns/iter (+/- 0.13)
test common::buf::bench::buflist_push_and_remaining_16  ... bench:         118.62 ns/iter (+/- 19.20)
test common::buf::bench::buflist_push_and_remaining_128 ... bench:         468.55 ns/iter (+/- 76.48)

@DioCrafts DioCrafts requested a review from paolobarbolini March 6, 2026 08:31
@paolobarbolini
Copy link
Contributor

I don't have review powers on this repository. For what it's worth, I think Sean was asking about the overall improvement to the client and/or server when these patches are applied on top of hyper, rather than the micro-benchmarks on the BufList APIs. While these are good, they don't really demonstrate how they would help hyper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants