Skip to content

Conversation

@jrick
Copy link
Member

@jrick jrick commented Dec 10, 2025

This special cases writes to bytes.Buffer, which is always the writer type written to by WriteMessageN. There are several optimizations that can be implemented by special casing this type:

First, pulling temporary short buffers from binary freelist can be skipped entirely, and instead the binary encoding of integers can be appended directly to its existing capacity. This avoids the synchronization cost to add and remove buffers from the free list, and for applications which only ever write wire messages with WriteMessageN, the allocation and ongoing garbage collection scanning cost to for these buffers can be completely skipped.

Second, special casing the buffer type in WriteVarString saves us from creating a temporary heap copy of a string, as the buffer's WriteString method can be used instead.

Third, special casing the buffer allows WriteMessageN to calculate the serialize size and grow its buffer so all remaining appends for writing block and transactions will not have to reallocate the buffer's backing allocation. This same optimization can be applied to other messages in the future.

@jrick jrick marked this pull request as draft December 10, 2025 15:59
@jrick

This comment was marked as outdated.

@jrick jrick force-pushed the optimize_write_buffer branch from 84a65f2 to 230fb69 Compare December 10, 2025 17:10
@jrick jrick marked this pull request as ready for review December 10, 2025 17:11
@jrick

This comment was marked as outdated.

@jrick

This comment was marked as outdated.

@jrick jrick force-pushed the optimize_write_buffer branch from 158b29e to a2cbcdf Compare December 10, 2025 21:10
@jrick
Copy link
Member Author

jrick commented Dec 10, 2025

Now rebased over #3584. Below is the performance improvement with both PRs, relative to master:

$ benchstat old.txt new.txt                                                   
goos: openbsd
goarch: amd64
pkg: github.com/decred/dcrd/wire
cpu: AMD Ryzen 7 5800X3D 8-Core Processor           
                │   old.txt   │               new.txt               │
                │   sec/op    │   sec/op     vs base                │
WriteMessageN-8   2.784µ ± 2%   1.652µ ± 1%  -40.67% (p=0.000 n=10)

                │  old.txt   │              new.txt               │
                │    B/op    │    B/op     vs base                │
WriteMessageN-8   592.0 ± 0%   328.0 ± 0%  -44.59% (p=0.000 n=10)

                │  old.txt   │              new.txt               │
                │ allocs/op  │ allocs/op   vs base                │
WriteMessageN-8   7.000 ± 0%   3.000 ± 0%  -57.14% (p=0.000 n=10)

@jrick jrick force-pushed the optimize_write_buffer branch 6 times, most recently from 2cda86f to a182543 Compare December 11, 2025 21:24
@jrick jrick force-pushed the optimize_write_buffer branch 3 times, most recently from b4a27d0 to 0ea4d52 Compare December 12, 2025 08:25
This special cases writes to bytes.Buffer, which is always the writer type
written to by WriteMessageN.  There are several optimizations that can be
implemented by special casing this type:

First, pulling temporary short buffers from binary freelist can be skipped
entirely, and instead the binary encoding of integers can be appended directly
to its existing capacity.  This avoids the synchronization cost to add and
remove buffers from the free list, and for applications which only ever write
wire messages with WriteMessageN, the allocation and ongoing garbage
collection scanning cost to for these buffers can be completely skipped.

Second, special casing the buffer type in WriteVarString saves us from
creating a temporary heap copy of a string, as the buffer's WriteString method
can be used instead.

Third, special casing the buffer allows WriteMessageN to calculate the
serialize size and grow its buffer so all remaining appends for writing block
and transactions will not have to reallocate the buffer's backing allocation.
This same optimization can be applied to other messages in the future.
@jrick jrick force-pushed the optimize_write_buffer branch from 0ea4d52 to 396b860 Compare December 12, 2025 14:12
@davecgh davecgh added this to the 2.2.0 milestone Dec 29, 2025
Copy link
Member

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall. This help further reduce the number of allocations the GC has to deal with.

I've identified a few nits inline.

}

// writeUint16 writes the little endian encoding of value to the writer.
func writeUint16(w io.Writer, value uint16) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please name this writeUint16LE to make it clear it encodes it using little endian. This is consistent with other cases in the code such as putUint16LE in txscript and WriteUint16LE of the blake256 hashers.

The only reason the existing exported funcs on the binaryFreeList weren't named that way is because they took the byte order as a parameter.

I really prefer not having to dig further into the write funcs to clearly and ambiguously immediately see which byte order is being used at the call sites.

}

// writeUint32 writes the little endian encoding of value to the writer.
func writeUint32(w io.Writer, value uint32) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, but writeUint32LE.

}

// writeUint64 writes the little endian encoding of value to the writer.
func writeUint64(w io.Writer, value uint64) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeUint64LE

copy(elems.checksum[:], cksumHash[0:4])
buf.Reset()
writeElements(&buf, &elems.dcrnet, &elems.command, &elems.lenp, &elems.checksum)
writeUint32(buf, uint32(elems.dcrnet))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have any issue with changing these, but what's the motivation of not using the arguably slightly more readable version that it's replacing?

Also, if this is done, I believe the *[4]byte: and *[CommandSize]uint8: cases can be removed from writeElement since nothing else uses those sizes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeElements causes an additional allocation.

it's obviously handy to have it for the message serialization implementations, but this is quite low hanging fruit as we can knock out an additional allocation on every message written over the wire.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. I was thinking the elems struct would still end up on the heap due to the call to Write, but it won't because it's now calling bytes.Buffer.Write which takes raw bytes.

So, we don't really need the elems struct anymore either then. It was a nice optimization to reduce the individual allocs down to 1 for use with writeElements, but now that none of the individual elements escape, it wouldn't matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants