Skip to content

support for binlog compressed payload (zstd)#177

Closed
alexpacio wants to merge 5 commits into
osheroff:masterfrom
alexpacio:mysql-binlog-zstd-compression-support
Closed

support for binlog compressed payload (zstd)#177
alexpacio wants to merge 5 commits into
osheroff:masterfrom
alexpacio:mysql-binlog-zstd-compression-support

Conversation

@alexpacio

Copy link
Copy Markdown

Summary

MySQL 8.0.20 introduced binlog_transaction_compression=ON, which wraps groups of binlog events into a single TRANSACTION_PAYLOAD event (optionally ZSTD-compressed). Without this change, listeners only ever received the opaque wrapper — individual row-change events were invisible and GTID tracking stalled.

  • BinaryLogClient: new handleEvent() intercepts TRANSACTION_PAYLOAD events, decompresses/unpacks inner events, restamps each inner event's eventLength and nextPosition with the outer envelope's values (so getBinlogPosition() stays consistent), then dispatches them through the normal updateGtidSetnotifyEventListenersupdateClientBinlogFilenameAndPosition pipeline.
  • TransactionPayloadEventDataDeserializer: adds COMPRESSION_TYPE_NONE (255) alongside ZSTD (0); propagates the outer EventDeserializer's CompatibilityMode flags into the inner deserializer so row events (e.g. CHAR_AND_BINARY_AS_BYTE_ARRAY) behave consistently; throws a clear IOException for unknown compression types.
  • EventDeserializer: forwards setCompatibilityMode() calls to any registered TransactionPayloadEventDataDeserializer.
  • TransactionPayloadEventData: initialises uncompressedEvents to an empty list to prevent NPE in toString() before deserialization completes.

Test plan

  • BinaryLogClientTest#testTransactionPayloadNotifiesInnerEvents — inner events are notified with correct type, XID value, and restamped position headers
  • BinaryLogClientTest#testGtidSetAdvancesWhenCompressedTransactionCommitsInsidePayload — GTID set advances correctly when the GTID precedes a compressed transaction payload
  • TransactionPayloadEventDataDeserializerTest#deserializeUncompressedPayloadCOMPRESSION_TYPE_NONE payload deserializes successfully
  • TransactionPayloadEventDataDeserializerTest#deserializeUnsupportedCompressionType — unknown type throws IOException with a descriptive message
  • TransactionPayloadEventDataDeserializerTest#deserializeAppliesCompatibilityModesToInnerEventsCompatibilityMode propagates to inner row events (verifies byte[] column type)
  • Existing TransactionPayloadEventDataDeserializerTest#deserializeCompressedPayload — existing ZSTD path still passes

🤖 Generated with Claude Code

…PAYLOAD (zstd / none)

MySQL 8.0.20+ can compress groups of binlog events into a single
TRANSACTION_PAYLOAD event when binlog_transaction_compression=ON.
Previously, listeners received the opaque wrapper and never saw the
individual inner events, causing GTID tracking to stall and making
row-change events invisible.

Changes:
- BinaryLogClient: intercept TRANSACTION_PAYLOAD events, decompress
  the payload, restamp each inner event's position fields with those of
  the outer envelope, then notify listeners per inner event and advance
  the GTID set / binlog position as usual.
- TransactionPayloadEventDataDeserializer: add COMPRESSION_TYPE_NONE
  (255) alongside ZSTD (0), propagate the outer EventDeserializer's
  CompatibilityMode flags into the inner deserializer used for
  decompressed events, and throw a clear error for unknown types.
- EventDeserializer: forward CompatibilityMode changes to any registered
  TransactionPayloadEventDataDeserializer.
- TransactionPayloadEventData: initialise uncompressedEvents to an empty
  list to avoid NPE on toString() before deserialization runs.
- Tests: unit-test GTID advancement through a compressed transaction,
  transparent inner-event notification with correct header restamping,
  COMPRESSION_TYPE_NONE deserialization, unsupported-type error path, and
  CompatibilityMode propagation to inner row events.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@alexpacio

Copy link
Copy Markdown
Author

@osheroff as you asked, this is the same change done in the connector. Please review.

@alexpacio alexpacio closed this Jun 13, 2026
@alexpacio alexpacio reopened this Jun 13, 2026
@alexpacio

Copy link
Copy Markdown
Author

@osheroff i've added a further commit to iterate over decompressed payload chunks rather than just allocating an entire array buffer to temporarly store the result.

break;
}
restampInnerEventHeader(transactionPayloadEvent, innerEvent);
updateGtidSet(innerEvent);

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguable but i think this should happen after we finish processing all the compressed events -- ie after we break out of the while loop

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in other places in code we bump the gtid set first, mph

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by the restructure: inner events now flow through the exact same updateGtidSetnotifyEventListenersupdateClientBinlogFilenameAndPosition path as every other event, so the gtid set is bumped first, per-event, just like everywhere else — no special-case ordering to reason about. (4467565)

}
}

private void notifyTransactionPayloadEvent(Event transactionPayloadEvent) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void notifyTransactionPayloadEvent(Event transactionPayloadEvent) {
private void decompressTransaction(Event transactionPayloadEvent) {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method no longer lives in the client — the decompress/stream loop moved into EventDeserializer.nextTransactionPayloadEvent() + fillTransactionPayloadBuffer(). So the rename is moot, but the 'this decompresses a transaction' intent carried over into the new names. (4467565)


private void notifyTransactionPayloadEvent(Event transactionPayloadEvent) {
TransactionPayloadEventData transactionPayloadEventData =
(TransactionPayloadEventData) EventDataWrapper.internal(transactionPayloadEvent.getData());

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line looks terrible, there must be a cleaner way to write this?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed from the client. The one remaining cast now sits in EventDeserializer.nextTransactionPayloadEvent as a single (TransactionPayloadEventData) EventDataWrapper.internal(eventData), matching the EventDataWrapper.internal(...) pattern already used throughout that class. (4467565)

void handleEvent(Event event) {
EventType eventType = event.getHeader().getEventType();
if (eventType == EventType.TRANSACTION_PAYLOAD &&
EventDataWrapper.internal(event.getData()) instanceof TransactionPayloadEventData) {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the AI is really fond of this pattern. I think just checking eventType is plenty enough, no idea wtf the second check is here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gone — handleEvent no longer special-cases TRANSACTION_PAYLOAD at all. The unwrap moved down into EventDeserializer, so there's neither the eventType check nor the instanceof here anymore. (4467565)

@osheroff

Copy link
Copy Markdown
Owner

structurally, I think this could live at a lower level than the binlog client. maybe somewhere alongside TransactionPayloadEvent would be more suitable.

…r, not BinaryLogClient

Addresses osheroff's review on osheroff#177. The payload-unwrapping logic now lives
below the binlog client: EventDeserializer.nextEvent() transparently streams a
TRANSACTION_PAYLOAD's inner events (each restamped with the outer envelope's
event-length / next-position) one at a time, reading one ahead so a
packet-oriented reader knows when a payload is still being drained.

As a result BinaryLogClient no longer special-cases TRANSACTION_PAYLOAD:
handleEvent() is again a plain updateGtidSet -> notifyEventListeners ->
updateClientBinlogFilenameAndPosition, and the read loop simply keeps pulling
events until the packet (and any payload it carried) is drained. Inner events
flow through the exact same gtid/position path as standalone events.
BinaryLogFileReader gets the same transparent unpacking for free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexpacio

Copy link
Copy Markdown
Author

Good call — pushed the whole unwrap below the binlog client (4467565).

EventDeserializer.nextEvent() now transparently expands a TRANSACTION_PAYLOAD into its inner events: it decompresses/parses them one at a time, restamps each with the outer envelope's event-length + next-position, and reads one event ahead so a packet-oriented reader can tell (via hasBufferedTransactionPayloadEvent()) when a payload is still being drained vs. when to pull the next packet.

Net effect:

  • BinaryLogClient no longer knows TRANSACTION_PAYLOAD exists. handleEvent() is back to a plain updateGtidSetnotifyEventListenersupdateClientBinlogFilenameAndPosition, and the read loop just keeps pulling events until the packet (and any payload it carried) is drained. The instanceof, the ugly cast, the bespoke restamp/notify loop, and the per-payload error handling all went away — that addresses the inline comments too.
  • BinaryLogFileReader gets the same unwrapping for free, since it shares nextEvent(). Reading a compressed binlog file now surfaces the actual row events instead of the opaque wrapper (updated testNextEventCompressed from 5 → 8 events accordingly). Heads-up that this is a behavioural change for any direct BinaryLogFileReader consumer that was expecting to see the TRANSACTION_PAYLOAD event itself.

Streaming/backpressure is preserved (still one inner event at a time, no full materialization), and a mid-payload decompress/parse failure is surfaced as an ordinary deserialization failure (never an EOF/socket error) so it's reported-and-skipped rather than triggering a reconnect-and-refetch loop on the same payload.

@alexpacio

Copy link
Copy Markdown
Author

Pushed 873bde5 (Stream transaction payload events) with the full streaming payload path and README note.

What changed in this update:

  • TRANSACTION_PAYLOAD compressed payload size is tracked as long and can be stream-backed instead of forced into one byte[].
  • Live split-packet reads now use a de-chunking stream instead of reassembling all 16MB chunks into one growing array.
  • The outer transaction-payload event block stays open while inner events stream, then the checksum is skipped after the payload is drained.
  • README now documents transparent MySQL 8.0 binlog_transaction_compression support.

Validation run locally:

  • ./mvnw -q test passed.
  • ./mvnw -q -Dit.test=BinaryLogFileReaderIntegrationTest failsafe:integration-test failsafe:verify passed.
  • MYSQL_VERSION=8.0 ./mvnw -q -Dit.test=BinaryLogClientIntegrationTest#testWriteUpdateDeleteEvents failsafe:integration-test failsafe:verify passed against a live MySQL 8.0.32 harness.
  • I also tried TransactionPayloadIntegrationTest; it generated and committed the 800-row transaction (~2.25GB uncompressed) and the Java process stayed stable around ~594MB RSS. I stopped it while the temporary MySQL replica was still spending many minutes applying the giant row batch before emitting it back to the client, so that run did not complete, but it did not fail in connector code or with a Java OOM.

I cleaned the generated onetimeserver temp data afterward.

@alexpacio alexpacio changed the title feat: transparently unpack inner events from TRANSACTION_PAYLOAD (ZSTD + uncompressed) support for binlog compressed payload (zstd) Jun 14, 2026
@alexpacio

Copy link
Copy Markdown
Author

@osheroff I've done the refactor you asked, along with paying attention to properly use streams vs byte[] to store the payload. Please let me know if it looks good to you, i'm going to test it tomorrow on a real environment

if (eventBodyLength > Integer.MAX_VALUE) {
throw new IOException("Event data length " + eventBodyLength +
" exceeds the maximum supported size of " + Integer.MAX_VALUE + " bytes");
}

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, undo

blockLength--;
}
return inputStream.read();
return read;

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, undo all changes to this file

@osheroff

Copy link
Copy Markdown
Owner

sorry, you gotta start over at this point; the aI has just littered changes all over the place I don't want and it's impossible to tell what's relevant and what isn't. usually claude isn't quite so aggressive.

the structure is almost right except you need a new class, a TransactionPayloadEventBuffer or something, held by the EventDeserializer, that if non-empty will return the next uncompressed event -- this will move all that code out of being directly in the eventDeserializer.

@alexpacio

Copy link
Copy Markdown
Author

Superseded by #178 — please review there instead.

#178 is a refactor of this work with the same behavior and the same >2GB / cross-packet
streaming capability, but the per-payload streaming state machine has been extracted out
of EventDeserializer into a dedicated TransactionPayloadEventBuffer (held by
EventDeserializer), which shrinks EventDeserializer's diff from +211 to +38 lines and
makes the change far easier to review. It also fixes a latent bug where a custom
TRANSACTION_PAYLOAD deserializer would throw instead of falling back to whole-payload
materialization.

Continuing the review on #178.

@osheroff

Copy link
Copy Markdown
Owner

ok i kinda hacked this all into shape here #179, tested, works locally. I'm missing a gpg key from an old laptop but will release and do the maxwell dance later this week

@osheroff osheroff closed this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants