Implement drainingRead mechanism for JS-backed streams #5838

jasnell · 2026-01-06T19:00:54Z

The next ~~experiment~~ step in improving the performance of streams pump to, here specifically in the ReadableSourceKjAdapter pump to... we implement ~~an experimental~~ a "draining read" that will consume as much data as possible synchronously on each read. The results are promising.

We will optimize the other cases (e.g. KJ-readable-to-JS-writable) in a separate PR. This one is specifically looking to improve the JS-readable-to-KJ-writable case.

Claude did most of the work here under supervision.

  Value Streams (Default highWaterMark=0)

  | Benchmark        | New (µs) | Existing (µs) | Speedup | Throughput New | Throughput Existing |
  |------------------|----------|---------------|---------|----------------|---------------------|
  | Tiny (64B×256)   | 1,555    | 2,796         | 1.8x    | 10.1 MB/s      | 5.6 MB/s            |
  | Small (256B×100) | 612      | 1,047         | 1.7x    | 40.0 MB/s      | 23.4 MB/s           |
  | Medium (4KB×100) | 667      | 1,110         | 1.7x    | 589 MB/s       | 354 MB/s            |
  | Large (64KB×16)  | 198      | 262           | 1.3x    | 5.0 GB/s       | 3.8 GB/s            |

  Value Streams (highWaterMark=16KB)

  | Benchmark        | New (µs) | Existing (µs) | Speedup | Throughput New | Throughput Existing |
  |------------------|----------|---------------|---------|----------------|---------------------|
  | Tiny (64B×256)   | 480      | 2,621         | 5.5x    | 32.7 MB/s      | 6.0 MB/s            |
  | Small (256B×100) | 195      | 1,024         | 5.2x    | 125 MB/s       | 23.9 MB/s           |
  | Medium (4KB×100) | 246      | 1,089         | 4.4x    | 1.56 GB/s      | 361 MB/s            |
  | Large (64KB×16)  | 168      | 248           | 1.5x    | 6.1 GB/s       | 4.0 GB/s            |

  Byte Streams (Default highWaterMark=0)

  | Benchmark        | New (µs) | Existing (µs) | Speedup | Throughput New | Throughput Existing |
  |------------------|----------|---------------|---------|----------------|---------------------|
  | Tiny (64B×256)   | 1,533    | 3,047         | 2.0x    | 10.2 MB/s      | 5.2 MB/s            |
  | Small (256B×100) | 617      | 1,231         | 2.0x    | 39.6 MB/s      | 19.9 MB/s           |
  | Medium (4KB×100) | 658      | 1,274         | 1.9x    | 597 MB/s       | 309 MB/s            |
  | Large (64KB×16)  | 192      | 2,781         | 14.5x   | 5.1 GB/s       | 363 MB/s            |

  Byte Streams (highWaterMark=16KB)

  | Benchmark        | New (µs) | Existing (µs) | Speedup | Throughput New | Throughput Existing |
  |------------------|----------|---------------|---------|----------------|---------------------|
  | Tiny (64B×256)   | 530      | 583           | 1.1x    | 29.6 MB/s      | 26.9 MB/s           |
  | Small (256B×100) | 223      | 306           | 1.4x    | 110 MB/s       | 80.1 MB/s           |
  | Medium (4KB×100) | 340      | 1,317         | 3.9x    | 1.13 GB/s      | 299 MB/s            |
  | Large (64KB×16)  | 186      | 2,706         | 14.5x   | 5.3 GB/s       | 372 MB/s            |

  Byte Streams with autoAllocateChunkSize=64KB

  | Benchmark        | New (µs) | Existing (µs) | Speedup | Throughput New | Throughput Existing |
  |------------------|----------|---------------|---------|----------------|---------------------|
  | Tiny (64B×256)   | 1,582    | 3,967         | 2.5x    | 9.9 MB/s       | 4.0 MB/s            |
  | Small (256B×100) | 627      | 1,556         | 2.5x    | 39.0 MB/s      | 15.8 MB/s           |
  | Medium (4KB×100) | 677      | 1,590         | 2.3x    | 580 MB/s       | 248 MB/s            |
  | Large (64KB×16)  | 189      | 354           | 1.9x    | 5.2 GB/s       | 2.8 GB/s            |

  Async Streams (Microtask delays - SlowValue)

  | Benchmark        | New (µs) | Existing (µs) | Speedup |
  |------------------|----------|---------------|---------|
  | Small (256B×100) | 858      | 1,164         | 1.4x    |
  | Medium (4KB×100) | 930      | 1,182         | 1.3x    |

  I/O Latency Streams (KJ event loop yields)

  | Benchmark    | New (µs) | Existing (µs) | Speedup |
  |--------------|----------|---------------|---------|
  | Small Value  | 1,042    | 1,361         | 1.3x    |
  | Medium Value | 1,111    | 1,397         | 1.3x    |
  | Large Value  | 269      | 308           | 1.1x    |
  | Small Byte   | 1,346    | 1,557         | 1.2x    |
  | Medium Byte  | 1,362    | 1,606         | 1.2x    |
  | Large Byte   | 247      | 2,796         | 11.3x   |

  Timed Streams (Real timer delays)

  | Benchmark          | New (µs) | Existing (µs) | Speedup |
  |--------------------|----------|---------------|---------|
  | Small 10µs delay   | 3,474    | 1,411         | 0.4x ⚠️ |
  | Small 100µs delay  | 108,170  | 105,873       | 1.0x    |
  | Small 1ms delay    | 110,401  | 111,430       | 1.0x    |
  | Medium 100µs delay | 105,322  | 104,661       | 1.0x    |

  ---
  Summary

  | Category            | Scenarios                               | Result              |
  |---------------------|-----------------------------------------|---------------------|
  | Big Wins (>2x)      | Tiny/Small with HWM, Large Byte streams | 2x - 14.5x faster   |
  | Solid Wins (1.3-2x) | Most sync streams, I/O latency          | 1.3x - 2x faster    |
  | Neutral (~1x)       | Timed 100µs+, some HWM byte streams     | Similar performance |
  | Regression          | Timed 10µs only                         | 0.4x (2.5x slower)  |

The one perf regression is in an artificial scenario.

codspeed-hq · 2026-01-06T19:32:43Z

CodSpeed Performance Report

Merging this PR will degrade performance by 37.84%

_{Comparing jasnell/streams-draining-read (0ac80c4) with main (9825f50)}

Summary

⚡ 16 improved benchmarks
❌ 17 regressed benchmarks
✅ 96 untouched benchmarks
🆕 19 new benchmarks
⏩ 49 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
❌	`New_Small_SlowValue`	2.5 ms	4.1 ms	-37.84%
❌	`New_Small_Timed100us`	3.6 ms	5.1 ms	-29.5%
❌	`New_Small_Timed10us`	3.6 ms	5.1 ms	-29.41%
⚡	`New_Tiny_Value_HWM16K`	4.7 ms	2.5 ms	+83.07%
❌	`New_Tiny_Value`	4.7 ms	7.3 ms	-35.77%
❌	`New_Small_Timed1ms`	3.6 ms	5.1 ms	-29.49%
🆕	`New_LargeStream_Value`	N/A	293.1 ms	N/A
⚡	`New_Small_Value_HWM16K`	1.9 ms	1.1 ms	+69.71%
⚡	`New_Tiny_Byte_Auto64K`	23.3 ms	7.5 ms	×3.1
⚡	`New_Large_Byte`	6.9 ms	3.3 ms	×2.1
⚡	`New_Large_Byte_HWM16K`	6.9 ms	3.3 ms	×2.1
⚡	`Encode_ASCII_32[TextEncoder][0/0/32]`	3.3 ms	3 ms	+12.37%
⚡	`New_Large_IoLatencyByte`	7.2 ms	3.6 ms	×2
⚡	`New_Large_Byte_Auto64K_HWM16K`	4 ms	3.3 ms	+21.15%
⚡	`New_Large_Byte_Auto64K`	3.9 ms	3.3 ms	+20.02%
❌	`New_Large_IoLatencyValue`	2.9 ms	3.6 ms	-20.24%
❌	`New_Large_Value`	2.6 ms	3.3 ms	-19.83%
❌	`New_Large_Value_HWM16K`	2.7 ms	3.4 ms	-22.01%
❌	`New_Medium_IoLatencyByte`	5.4 ms	6.4 ms	-16.6%
⚡	`New_Medium_Byte_Auto64K`	10.1 ms	3.7 ms	×2.7
...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

49 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

jasnell · 2026-01-08T00:31:31Z

This also now includes a needed improvement to the handling of the autoAllocateChunkSize option on byte-oriented streams. Before we were defaulting to a 4kb buffer when autoAllocateChunkSize was not specified at all, resulting in additional overhead in a number of ways. A new compat flag removes that default, turns the read into a "default" read with a 16kb buffer. Perf improvements are measurable in the benchmarks

  Value Streams (unaffected by autoAllocateChunkSize change)

  | Benchmark           | New                 | Existing            | Speedup | New WriteOps | Existing WriteOps |
  |---------------------|---------------------|---------------------|---------|--------------|-------------------|
  | Tiny_Value          | 1428 μs (10.9 MB/s) | 2689 μs (5.8 MB/s)  | 1.9×    | 0.56         | 0.99              |
  | Tiny_Value_HWM16K   | 444 μs (35.2 MB/s)  | 2535 μs (6.2 MB/s)  | 5.7×    | 0.0007       | 0.95              |
  | Small_Value         | 569 μs (43.0 MB/s)  | 1028 μs (23.8 MB/s) | 1.8×    | 0.08         | 0.16              |
  | Small_Value_HWM16K  | 180 μs (135.7 MB/s) | 1010 μs (24.2 MB/s) | 5.6×    | 0.0003       | 0.15              |
  | Medium_Value        | 599 μs (655 MB/s)   | 1061 μs (370 MB/s)  | 1.8×    | 0.09         | 0.16              |
  | Medium_Value_HWM16K | 246 μs (1.57 GB/s)  | 1025 μs (383 MB/s)  | 4.2×    | 0.0003       | 0.15              |
  | Large_Value         | 180 μs (5.4 GB/s)   | 252 μs (3.9 GB/s)   | 1.4×    | 0.004        | 0.006             |
  | LargeStream_Value   | 59.9 ms (40.9 MB/s) | 120 ms (20.6 MB/s)  | 2.0×    | 1000         | 2000              |

  Byte Streams (affected by spec-compliant autoAllocateChunkSize)

  | Benchmark          | New                 | Existing            | Speedup | New WriteOps | Existing WriteOps |
  |--------------------|---------------------|---------------------|---------|--------------|-------------------|
  | Tiny_Byte          | 1497 μs (10.5 MB/s) | 2980 μs (5.3 MB/s)  | 2.0×    | 0.57         | 1.11              |
  | Tiny_Byte_HWM16K   | 489 μs (32.1 MB/s)  | 536 μs (29.2 MB/s)  | 1.1×    | 0.0007       | 0.004             |
  | Small_Byte         | 590 μs (41.4 MB/s)  | 1171 μs (20.9 MB/s) | 2.0×    | 0.09         | 0.18              |
  | Small_Byte_HWM16K  | 205 μs (119 MB/s)   | 290 μs (84.5 MB/s)  | 1.4×    | 0.0006       | 0.004             |
  | Medium_Byte        | 655 μs (599 MB/s)   | 1265 μs (311 MB/s)  | 1.9×    | 0.09         | 0.18              |
  | Medium_Byte_HWM16K | 350 μs (1.1 GB/s)   | 1220 μs (322 MB/s)  | 3.5×    | 0.01         | 0.18              |
  | Large_Byte         | 189 μs (5.2 GB/s)   | 2697 μs (374 MB/s)  | 14.3×   | 0.004        | 0.99              |
  | Large_Byte_HWM16K  | 182 μs (5.4 GB/s)   | 2671 μs (377 MB/s)  | 14.7×   | 0.004        | 0.99              |

  Byte Streams with Explicit autoAllocateChunkSize=64KB

  | Benchmark           | New                 | Existing            | Speedup | New WriteOps | Existing WriteOps |
  |---------------------|---------------------|---------------------|---------|--------------|-------------------|
  | Tiny_Byte_Auto64K   | 1457 μs (10.7 MB/s) | 3796 μs (4.1 MB/s)  | 2.6×    | 0.56         | 1.39              |
  | Small_Byte_Auto64K  | 577 μs (42.4 MB/s)  | 1484 μs (16.6 MB/s) | 2.6×    | 0.09         | 0.21              |
  | Medium_Byte_Auto64K | 631 μs (622 MB/s)   | 1530 μs (257 MB/s)  | 2.4×    | 0.09         | 0.21              |
  | Large_Byte_Auto64K  | 189 μs (5.2 GB/s)   | 345 μs (2.8 GB/s)   | 1.8×    | 0.004        | 0.008             |

  I/O Latency Streams

  | Benchmark             | New                 | Existing            | Speedup | New WriteOps | Existing WriteOps |
  |-----------------------|---------------------|---------------------|---------|--------------|-------------------|
  | Small_IoLatencyValue  | 1025 μs (23.8 MB/s) | 1358 μs (18.0 MB/s) | 1.3×    | 0.15         | 0.20              |
  | Medium_IoLatencyValue | 1060 μs (370 MB/s)  | 1416 μs (277 MB/s)  | 1.3×    | 0.16         | 0.21              |
  | Large_IoLatencyValue  | 266 μs (3.7 GB/s)   | 299 μs (3.3 GB/s)   | 1.1×    | 0.006        | 0.007             |
  | Small_IoLatencyByte   | 1234 μs (19.9 MB/s) | 1569 μs (15.6 MB/s) | 1.3×    | 0.18         | 0.22              |
  | Medium_IoLatencyByte  | 1279 μs (308 MB/s)  | 1603 μs (245 MB/s)  | 1.3×    | 0.19         | 0.25              |
  | Large_IoLatencyByte   | 247 μs (4.0 GB/s)   | 2787 μs (362 MB/s)  | 11.3×   | 0.006        | 0.97              |

  Key Takeaways

  1. Massive improvement for large byte streams: 14× faster for Large_Byte due to spec-compliant 16KB DEFAULT reads vs legacy 4KB BYOB reads
  2. Consistent 2× improvement for small/medium byte streams across all configurations
  3. WriteOps dramatically reduced: New approach coalesces writes much better (e.g., Large_Byte: 0.004 vs 0.99 writes per iteration)
  4. HWM16K configurations: New approach benefits significantly from highWaterMark buffering

jasnell force-pushed the jasnell/streams-draining-read branch from ccf8ddb to 532de1d Compare January 6, 2026 21:28

harrishancock self-requested a review January 7, 2026 17:32

jasnell mentioned this pull request Jan 7, 2026

Remove the speculative perf strategies from new streams adapters #5834

Closed

jasnell added 4 commits January 7, 2026 14:49

Implement drainingRead mechanism for JS-backed streams

15ea45a

Implement DrainingReader

89a5ff5

Use drainingRead in new streams adapters

af97573

Add maxRead to draining read

a580f71

jasnell force-pushed the jasnell/streams-draining-read branch from 3e3b2fc to a580f71 Compare January 7, 2026 22:49

jasnell marked this pull request as ready for review January 7, 2026 22:56

jasnell requested review from a team as code owners January 7, 2026 22:56

Improve spec compliant handling of autoAllocateChunkSize

0ac80c4

jasnell requested review from guybedford and mikea January 8, 2026 00:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement drainingRead mechanism for JS-backed streams #5838

Implement drainingRead mechanism for JS-backed streams #5838

Uh oh!

jasnell commented Jan 6, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

jasnell commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Implement drainingRead mechanism for JS-backed streams #5838

Are you sure you want to change the base?

Implement drainingRead mechanism for JS-backed streams #5838

Uh oh!

Conversation

jasnell commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging this PR will degrade performance by 37.84%

Summary

Performance Changes

Footnotes

Uh oh!

jasnell commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jasnell commented Jan 6, 2026 •

edited

Loading

codspeed-hq bot commented Jan 6, 2026 •

edited

Loading