perf: eliminate ParserConfig clones on every H1 request#4028
Closed
DioCrafts wants to merge 2 commits intohyperium:masterfrom
Closed
perf: eliminate ParserConfig clones on every H1 request#4028DioCrafts wants to merge 2 commits intohyperium:masterfrom
DioCrafts wants to merge 2 commits intohyperium:masterfrom
Conversation
- Change ParseContext.h1_parser_config from owned ParserConfig to &'a ParserConfig - Remove .clone() in Conn::poll_read_head() (conn.rs) - Remove .clone() in Buffered::parse() retry loop (io.rs) - Pre-allocate read buffer with INIT_BUFFER_SIZE instead of capacity(0) (io.rs) - Update all test call sites in role.rs and io.rs Eliminates 2 unnecessary ParserConfig copies per HTTP/1.1 request in the hot parsing path.
0x676e67
approved these changes
Mar 3, 2026
Add a field to BufList that tracks the total number of bytes across all buffers. This avoids iterating the entire VecDeque on every call to remaining(), which is invoked from hot paths like poll_flush, can_buffer, and advance. Previously, remaining() was O(n) where n is the number of buffers in the queue (up to 16 in Queue write strategy). Now it is O(1) — a simple field read.
Contributor
|
The title of the submission here seems a bit messy. I think it would be better to split the different optimizations into two separate PRs. |
Member
|
Thanks! I believe it originally used a clone for lifetimes. I assumed the compiler could inline and eliminate the code, does this result in a difference? (Also, could you keep one logical change per PR, please?) |
Author
|
@seanmonstar @0x676e67 Sure, apologize, i will split the logic in differents Pull requests. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Every HTTP/1.1 request triggered two unnecessary
ParserConfig::clone()calls in the hot parsing path:Conn::poll_read_head()clonedself.state.h1_parser_configto buildParseContextBuffered::parse()cloned it again inside the retry loop on each iterationParserConfigis a read-only config struct. Nothing in the parse chain mutates it. Cloning it on every request was pure waste.On top of that,
Buffered::new()allocated the read buffer withBytesMut::with_capacity(0). This forced the firstpoll_read_from_io()to hit the allocator before any data could be read. The buffer always grows toINIT_BUFFER_SIZE(8192) on the first read anyway.Changes
src/proto/h1/mod.rsParseContext.h1_parser_config:ParserConfig→&'a ParserConfigsrc/proto/h1/conn.rs.clone(). Now passes&self.state.h1_parser_configdirectly.src/proto/h1/io.rs.clone()in the parse loop. The reference just copies through.BytesMut::with_capacity(0)→BytesMut::with_capacity(INIT_BUFFER_SIZE).src/proto/h1/role.rs&ParserConfiginstead of owned values.Why this matters
These two clones ran on every single HTTP/1.1 request. For a server handling thousands of requests per second, that adds up. The fix is simple: pass a reference instead of copying the struct. All
httparse::ParserConfigmethods already take&self, so this works without any API change.The buffer pre-allocation removes one guaranteed allocation from every new connection. The buffer was going to be 8KB anyway. Now it starts there.
Testing
All 270 tests pass across 5 test suites.
No public API changes. No breaking changes.
Update:
BufList::remaining()O(n) → O(1)src/common/buf.rsBufList::remaining()was iterating the entireVecDequeon every call to sum up byte counts:This method is called from multiple hot paths on every flush cycle:
WriteBuf::remaining()— called inpoll_flushto check if there's data to writeWriteBuf::can_buffer()— called before buffering each new chunkWriteBuf::advance()— called after every partial write to the socketIn Queue write strategy (the default when the OS supports
writev, i.e. most Linux servers), the queue can hold up to 16 buffers. Every call walked all of them.Changes
remaining: usizefield toBufListthat tracks total bytes across all bufferspush(): increments the cached totaladvance(): decrements the cached totalcopy_to_bytes(): decrements in the optimized front-buffer paths; the fallback path goes throughadvance()which handles itremaining(): now returns the cached field directly — O(1)Impact
For a server at 100K req/s with ~8 buffers in queue and ~8 calls to
remaining()per request:The overhead of maintaining the counter is ~0.3ns per
push/advance(one integer add/sub).Testing
All 162 tests pass (61 client + 14 integration + 86 server + 12 doc-tests). No public API changes.