perf: eliminate ParserConfig clones on every H1 request by DioCrafts · Pull Request #4028 · hyperium/hyper

DioCrafts · 2026-03-03T18:00:02Z

Problem

Every HTTP/1.1 request triggered two unnecessary ParserConfig::clone() calls in the hot parsing path:

Conn::poll_read_head() cloned self.state.h1_parser_config to build ParseContext
Buffered::parse() cloned it again inside the retry loop on each iteration

ParserConfig is a read-only config struct. Nothing in the parse chain mutates it. Cloning it on every request was pure waste.

On top of that, Buffered::new() allocated the read buffer with BytesMut::with_capacity(0). This forced the first poll_read_from_io() to hit the allocator before any data could be read. The buffer always grows to INIT_BUFFER_SIZE (8192) on the first read anyway.

Changes

src/proto/h1/mod.rs

ParseContext.h1_parser_config: ParserConfig → &'a ParserConfig

src/proto/h1/conn.rs

Removed .clone(). Now passes &self.state.h1_parser_config directly.

src/proto/h1/io.rs

Removed .clone() in the parse loop. The reference just copies through.
Changed BytesMut::with_capacity(0) → BytesMut::with_capacity(INIT_BUFFER_SIZE).

src/proto/h1/role.rs

Updated 20 test call sites to pass &ParserConfig instead of owned values.

Why this matters

These two clones ran on every single HTTP/1.1 request. For a server handling thousands of requests per second, that adds up. The fix is simple: pass a reference instead of copying the struct. All httparse::ParserConfig methods already take &self, so this works without any API change.

The buffer pre-allocation removes one guaranteed allocation from every new connection. The buffer was going to be 8KB anyway. Now it starts there.

Testing

All 270 tests pass across 5 test suites.

No public API changes. No breaking changes.

Update: `BufList::remaining()` O(n) → O(1)

src/common/buf.rs

BufList::remaining() was iterating the entire VecDeque on every call to sum up byte counts:

fn remaining(&self) -> usize {
    self.bufs.iter().map(|buf| buf.remaining()).sum()
}

This method is called from multiple hot paths on every flush cycle:

WriteBuf::remaining() — called in poll_flush to check if there's data to write
WriteBuf::can_buffer() — called before buffering each new chunk
WriteBuf::advance() — called after every partial write to the socket

In Queue write strategy (the default when the OS supports writev, i.e. most Linux servers), the queue can hold up to 16 buffers. Every call walked all of them.

Changes

Added a remaining: usize field to BufList that tracks total bytes across all buffers
push(): increments the cached total
advance(): decrements the cached total
copy_to_bytes(): decrements in the optimized front-buffer paths; the fallback path goes through advance() which handles it
remaining(): now returns the cached field directly — O(1)

Impact

For a server at 100K req/s with ~8 buffers in queue and ~8 calls to remaining() per request:

Before: 6.4M iterator traversals/sec just to count bytes
After: 800K field reads (effectively free — single register read)

The overhead of maintaining the counter is ~0.3ns per push/advance (one integer add/sub).

Testing

All 162 tests pass (61 client + 14 integration + 86 server + 12 doc-tests). No public API changes.

- Change ParseContext.h1_parser_config from owned ParserConfig to &'a ParserConfig - Remove .clone() in Conn::poll_read_head() (conn.rs) - Remove .clone() in Buffered::parse() retry loop (io.rs) - Pre-allocate read buffer with INIT_BUFFER_SIZE instead of capacity(0) (io.rs) - Update all test call sites in role.rs and io.rs Eliminates 2 unnecessary ParserConfig copies per HTTP/1.1 request in the hot parsing path.

Add a field to BufList that tracks the total number of bytes across all buffers. This avoids iterating the entire VecDeque on every call to remaining(), which is invoked from hot paths like poll_flush, can_buffer, and advance. Previously, remaining() was O(n) where n is the number of buffers in the queue (up to 16 in Queue write strategy). Now it is O(1) — a simple field read.

0x676e67 · 2026-03-03T18:25:17Z

The title of the submission here seems a bit messy. I think it would be better to split the different optimizations into two separate PRs.

seanmonstar · 2026-03-03T18:27:34Z

Thanks! I believe it originally used a clone for lifetimes. I assumed the compiler could inline and eliminate the code, does this result in a difference?

(Also, could you keep one logical change per PR, please?)

DioCrafts · 2026-03-03T18:29:09Z

@seanmonstar @0x676e67 Sure, apologize, i will split the logic in differents Pull requests.

0x676e67 approved these changes Mar 3, 2026

View reviewed changes

DioCrafts requested a review from 0x676e67 March 3, 2026 18:27

DioCrafts closed this Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: eliminate ParserConfig clones on every H1 request#4028

perf: eliminate ParserConfig clones on every H1 request#4028
DioCrafts wants to merge 2 commits intohyperium:masterfrom
DioCrafts:perf/h1-parser-config-zero-copy

DioCrafts commented Mar 3, 2026 •

edited

Loading

Uh oh!

0x676e67 commented Mar 3, 2026

Uh oh!

seanmonstar commented Mar 3, 2026

Uh oh!

DioCrafts commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

DioCrafts commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Why this matters

Testing

No public API changes. No breaking changes.

Update: BufList::remaining() O(n) → O(1)

Changes

Impact

Testing

Uh oh!

0x676e67 commented Mar 3, 2026

Uh oh!

seanmonstar commented Mar 3, 2026

Uh oh!

DioCrafts commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DioCrafts commented Mar 3, 2026 •

edited

Loading

Update: `BufList::remaining()` O(n) → O(1)