[feat][bk] Add per-read no-read-ahead hint#4772
Conversation
|
@zymap @lhotari @merlimat @nodece @hangc0276 |
hangc0276
left a comment
There was a problem hiding this comment.
Does the prefetch logic have any performance impacts on BookKeeper? Reading by delay message is not a common case and I prefer not to introduce the complex on BookKeeper protocol to support the rare case.
|
@hangc0276 BookKeeper read-ahead is very useful for sequential catch-up reads, backlog consumption, and ledger scans. The problem is that some Pulsar/KoP read paths are not sequential streams. They use BookKeeper entries as lookup probes or sparse snapshot segments. In these cases, prefetching N+1/N+2 can have low hit rate and can pollute the read-ahead cache used by normal sequential readers. Examples:
So the intent is not to optimize a rare delayed-message path, but to let callers distinguish sequential data reads from point lookup reads while keeping read-ahead as the default behavior. |
Descriptions of the changes in this PR:
Fix #xyz
Main Issue: #xyz
BP: #xyz
Motivation
Pulsar delayed delivery uses
BucketDelayedDeliveryTrackerwhen there are too many delayed messages to keep all delayed indexes in broker memory. In that mode, delayed indexes are persisted as bucket snapshot segments in BookKeeper, and the broker lazily loads the next snapshot segment entry only after the current segment has been drained by time-driven delivery.This access pattern is a sparse point-read workload, not a normal sequential scan. A single lazy-load read for snapshot entry
Ncan currently trigger bookie-sidedbStorage_readAheadCacheprefill forN+1,N+2, and later entries, but those entries may not be read until much later, or may be evicted before use. This can waste bookie IO and direct memory, and can pollute the read-ahead cache needed by ordinary sequential reads such as backlog consumption, catch-up reads, and ledger scans.This change adds a per-read no-read-ahead hint so callers such as Pulsar can opt out only for these point-read paths while preserving the default read-ahead behavior for normal sequential workloads.
Changes
ReadOptionswithReadOptions.DEFAULTandReadOptions.builder().disableReadAhead(true).build()for request-scoped read options.ReadOptionsoverloads toReadHandleandLedgerHandle, including legacy callback APIs and unconfirmed read APIs that share the same internal read path.PendingReadOpto bookies through a newBookieProtocol.FLAG_NO_READ_AHEADbit.ReadRequest.readFlagsso bitmask-style read flags can coexist with the existing single-valueReadRequest.flagenum.Bookie.readEntry(..., noReadAhead).DbLedgerStorage/SingleDirectoryDbLedgerStoragesonoReadAhead=truestill uses and populates the target entry cache, but skips the extrafillReadAheadCache(...)prefill for following entries.PendingReadOpflag construction.