Skip to content

perf(metadata table) Metadata table archival should proceed until earliest commit to retain on data table#18215

Open
kbuci wants to merge 3 commits intoapache:masterfrom
kbuci:archival-mdt
Open

perf(metadata table) Metadata table archival should proceed until earliest commit to retain on data table#18215
kbuci wants to merge 3 commits intoapache:masterfrom
kbuci:archival-mdt

Conversation

@kbuci
Copy link
Copy Markdown
Contributor

@kbuci kbuci commented Feb 17, 2026

Summary and Changelog

Summary: Metadata table (MDT) archival should be able to archive instants that are before the earliest commit to retain (in latest clean) on the data table, provided that there are no savepoints. This is since

  • the metadata table anyway cannot be safely restored to a point before the earliest commit to retain (ECTR) on the data table
  • For datasets with infrequent ingestion writes and duration based compaction scheduling on MDT, there can be 1.5-2x the number of instant files in MDT active timeline compared to data table timeline. This is since each write to the data table will not only create a deltacommit on MDT, but most likely compaction commit as well.

Changelog:

  • TimelineUtils
    • getEarliestInstantForMetadataArchival now takes a third parameter: Option<String> earliestUncleanedDataTableInstantTimeOption (ECTR from the latest clean).
    • Logic uses “earliest possible restore commit” (ECTR if present and on timeline, else first commit), “smallest” of that and inflight, and when not archiving beyond savepoint, the minimum of first savepointed write, earliest inflight, and earliest possible restore commit.
    • Added getEarliestRetainedCommitFromLastClean(HoodieTableMetaClient) to read ECTR from the latest completed clean; throws HoodieIOException on read failure.
    • Added findSmallestInstant(Option<HoodieInstant>, Option<HoodieInstant>) and getEarliestPossibleRestoreCommit(HoodieActiveTimeline, Option<String>).
  • HoodieTimeline / BaseHoodieTimeline
    • New method findFirstSavepointedWrite(): returns the first write commit that has been savepointed (used when blocking MDT archival on savepoints).
  • TimelineArchiverV1 and TimelineArchiverV2
    • When computing the earliest instant to retain for the metadata table, both archivers now pass the data table ECTR: TimelineUtils.getEarliestRetainedCommitFromLastClean(dataMetaClient) is supplied as the third argument to getEarliestInstantForMetadataArchival.
  • Tests
    • TestTimelineUtils: All getEarliestInstantForMetadataArchival call sites updated to the 3-arg signature; added cases for ECTR present, ECTR not on timeline, and savepoints + ECTR (earlier than savepoints and between savepoints).
    • HoodieTestTable: Added addIncrementalClean(String instantTime, String earliestCommitToRetain) for tests that need a clean with a given ECTR.
    • TestHoodieTimelineArchiver: Added testArchivalInMetadataTableCanProceedUntilECTR to assert MDT archival proceeds up to the data table ECTR.

Impact

  • User-facing / API: No new public APIs. MDT archival may retain fewer instants (archive more)
  • Performance: Slightly less MDT timeline to scan when ECTR is present, due to more aggressive archival.

Risk Level

Low. It should be safe if instants on the data table before the ECTR don't have a corresponding deltacommit on MDT. This is since those instants must already have been committed, and cannot anyway be rolled back.

Documentation Update

None.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions Bot added the size:M PR with lines of changes in (100, 300] label Feb 17, 2026
@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@kbuci kbuci changed the title fix(metadata table) Metadata table archival should proceed past earliest commit to retain. perf(metadata table) Metadata table archival should proceed until earliest commit to retain on data table Feb 18, 2026
@kbuci kbuci marked this pull request as ready for review February 18, 2026 20:34
@danny0405 danny0405 self-assigned this Feb 23, 2026
@nsivabalan
Copy link
Copy Markdown
Contributor

hey @kbuci : but what does this buy us though. Its a minor optimization right. don't you think we can live w/ it and keep the code simple.

TimelineUtils.getEarliestInstantForMetadataArchival(
dataMetaClient.getActiveTimeline(), config.shouldArchiveBeyondSavepoint());
dataMetaClient.getActiveTimeline(), config.shouldArchiveBeyondSavepoint(),
TimelineUtils.getEarliestRetainedCommitFromLastClean(dataMetaClient));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataMetaClient.getActiveTimeline() is passed as a param here, we can inline the fetch of TimelineUtils.getEarliestRetainedCommitFromLastClean(dataMetaClient) inside the method.

if (earliestCommit.get().compareTo(earliestInflight.get()) < 0) {
return earliestCommit;
if (shouldArchiveBeyondSavepoint) {
return findSmallestInstant(earliestInflight, earliestPossibleRestoreCommit);
Copy link
Copy Markdown
Contributor

@danny0405 danny0405 Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the original logic, when shouldArchiveBeyondSavepoint is true, #getFirstNonSavepointCommit is invoked to filter out the savepointed instant(because for data table these instants are skipped during archiving, there is case that these savepoints are very old, the skip is intentional).

In the new logic, earliestPossibleRestoreCommit does not skip the savepointed instants when the last clean retained instant is not there, should we still skip by checking dataTableActiveTimeline.findFirstSavepointedWrite() in this case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Good catch. If shouldArchiveBeyondSavepoint is enabled, we need to ensure that old savepoints don't inadvertently become a hard floor for the archival watermark, as that would defeat the purpose of the config and cause the timeline to grow.

/**
* Returns the earliest commit to retain from the latest completed clean on the given table, if present.
*/
public static Option<String> getEarliestRetainedCommitFromLastClean(HoodieTableMetaClient metaClient) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can reuse the logic in SavepointActionExecutor#getLastCommitRetained.

return earliestUncleanedDataTableInstant;
}
}
return dataTableActiveTimeline.getCommitsTimeline().firstInstant();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when shouldArchiveBeyondSavepoint is true, should we skip the very old savepoints just like before here?

// - First save-pointed write
// - Earliest inflight
// - Earliest commit that dataset can potentially be restored to
return findSmallestInstant(dataTableActiveTimeline.findFirstSavepointedWrite(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if rolling back a savepoint is not a real case, maybe the comparison of dataTableActiveTimeline.findFirstSavepointedWrite() is not necessary.

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Style & Readability Review — The code changes look good overall, with a minor suggestion for a test helper's object creation.

return addClean(instantTime, Option.empty(), cleanerPlan, metadata, false, false);
}

public HoodieTestTable addIncrementalClean(String instantTime, String earliestCommitToRetain) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: For these complex objects like HoodieCleanerPlan and HoodieCleanStat with many parameters, consider using a builder pattern if they are created frequently with varying attributes. It can improve readability by explicitly naming the parameters.

Copy link
Copy Markdown
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for contributing! The logic to utilize ECTR for MDT archival is a great optimization to keep the MDT timeline lean and performant. The implementation is clean, but I left one inline comment regarding a potential edge case with missing write commits for savepoints.

}
HoodieInstant firstSavePoint = firstSavePointInstantOption.get();
return getWriteTimeline()
.filter(instant -> instant.requestedTime().equals(firstSavePoint.requestedTime()))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 If the write commit corresponding to the savepoint is missing (e.g., due to a corrupted timeline or manual deletion), this will return Option.empty(). Consequently, TimelineUtils.getEarliestInstantForMetadataArchival will silently ignore the savepoint and might incorrectly allow archiving past it. Consider returning the savepoint instant itself directly (or as a fallback).

if (earliestCommit.get().compareTo(earliestInflight.get()) < 0) {
return earliestCommit;
if (shouldArchiveBeyondSavepoint) {
return findSmallestInstant(earliestInflight, earliestPossibleRestoreCommit);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Good catch. If shouldArchiveBeyondSavepoint is enabled, we need to ensure that old savepoints don't inadvertently become a hard floor for the archival watermark, as that would defeat the purpose of the config and cause the timeline to grow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants