[RNTuple] Implement I/O Performance Metrics: Sparseness, Randomness, and Transactions #20994
+193
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This Pull request:
improves the existing Performance Metrics in RNTuple.
Changes or fixes:
This PR addresses the issue that had been raised for the improvements of metrics. It implements,
Implementation Details:
These have been implemented through the introduction of transient, thread-safe members in RNTupleMetrics,
std::atomicstd::uint64_t fSumSkip: To track total seek distance.
std::atomicstd::uint64_t fExplicitBytesRead: To track payload bytes.
std::atomicstd::uint64_t fTransactions: To track I/O operations.
I have changed RPageStorageFile to update these counters during ReadBuffer, ReadV, and LoadClusters operations. The use of std::atomic ensures thread safety with negligible performance overhead on the I/O path.
Reset Functionality & Accuracy:
I added a Reset() method to RNTupleMetrics. This is needed for obtaining accurate Randomness metrics for the analysis loop.
Without Reset(): The metric is dominated by the initial file seek (Header → Footer → Header), resulting in a Randomness score > 2.0.
With Reset(): Users can clear the counters after initialization, isolating the steady-state performance of their event loop.
Verification:
Added a new unit test TEST(Metrics, IOMetrics) in tree/ntuple/test/ntuple_metrics.cxx to verify the logic and the Reset() behavior.
Also tested locally using this code
Click to view Manual Verification Code & Output
This PR fixes #20853