Skip to content

introduce clear separation between proposed compaction, eligibility status, and compaction mode, along with improved task state tracking#18968

Open
cecemei wants to merge 20 commits intoapache:masterfrom
cecemei:compact3
Open

introduce clear separation between proposed compaction, eligibility status, and compaction mode, along with improved task state tracking#18968
cecemei wants to merge 20 commits intoapache:masterfrom
cecemei:compact3

Conversation

@cecemei
Copy link
Contributor

@cecemei cecemei commented Jan 30, 2026

This PR refactors compaction eligibility evaluation by introducing clear separation between proposed compaction (what segments to compact), eligibility status (whether compaction is needed), and compaction mode (how to compact), along with improved task state tracking.

Motivation

The previous CompactionStatus implementation mixed multiple concerns:

  1. Eligibility checks: Is compaction needed based on config?
  2. Policy decisions: Should we compact this interval now?
  3. Task state tracking: Is a compaction task already running?

This made it difficult to reason about compaction decisions and track the state of compaction candidates through the system.

Key Changes

1. Introduced CompactionMode Enum

New enum to represent different compaction strategies:

  • FULL_COMPACTION: Compact all segments in the interval
  • NOT_APPLICABLE: No compaction should be performed

The mode determines whether a candidate will be queued for compaction or skipped.

2. Refactored CompactionCandidate

New structure:

  • ProposedCompaction: Nested class containing the segment list, intervals, and statistics
    • Represents the raw "what should we compact?" without policy decisions
  • CompactionStatus: Eligibility evaluation result (COMPLETE, ELIGIBLE, NOT_ELIGIBLE)
    • Tracks compacted vs uncompacted segments and statistics
    • Contains reason for ineligibility when applicable
  • CompactionMode: How to perform compaction (FULL_COMPACTION or NOT_APPLICABLE)
  • policyNote: Optional note from the policy explaining why this candidate was chosen/skipped

Benefits:

  • Clear separation between the proposed segments and the decision to compact them
  • Easier to track why candidates are skipped or accepted
  • More maintainable and testable code structure

3. Updated CompactionStatus

New responsibilities:

  • Evaluates if segments need compaction based on config settings
  • Returns state: COMPLETE, ELIGIBLE, or NOT_ELIGIBLE
  • For eligible intervals, tracks:
    • Compacted segment statistics
    • Uncompacted segment statistics
  • Provides detailed reason when not eligible

Separation from policy:

  • CompactionStatus checks config-based requirements (granularity, rollup, etc.)
  • Policy then decides whether to actually compact eligible intervals

4. Simplified CompactionCandidateSearchPolicy

Changed interface:

  • Removed: checkEligibilityForCompaction(candidate, taskStatus) → Eligibility
  • Added: createCandidate(proposedCompaction, eligibility) → CompactionCandidate

Benefits:

  • Policies now create the final candidate after receiving eligibility info
  • Can add policy-specific notes and select compaction mode
  • Cleaner separation: status checks config compliance, policy makes scheduling decisions

5. Improved Task State Tracking

New TaskState enum in CompactionCandidate:

  • READY: No task running, can start compaction
  • TASK_IN_PROGRESS: Compaction task already running
  • RECENTLY_COMPLETED: Task recently finished, segments not yet updated

Benefits:

  • Clear distinction between eligibility and task state
  • Prevents launching duplicate tasks for the same interval
  • Easier to understand candidate lifecycle

Code Flow Example

1. DataSourceCompactibleSegmentIterator finds segments in a time chunk
   → Creates ProposedCompaction

2. CompactionStatus evaluates config compliance
   → Returns ELIGIBLE with compacted/uncompacted stats

3. Policy receives ProposedCompaction + CompactionStatus
   → Calls createCandidate() to produce final CompactionCandidate
   → Sets mode = FULL_COMPACTION if should compact, NOT_APPLICABLE if skip
   → Adds policyNote explaining decision

4. CompactionStatusTracker checks task state
   → Returns READY/TASK_IN_PROGRESS/RECENTLY_COMPLETED

5. Scheduler decides whether to launch task based on mode + task state

Testing

  • Added CompactionCandidateTest for new candidate structure
  • Added CompactionStatusBuilderTest for eligibility evaluation
  • Updated existing tests to work with new architecture

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@cecemei cecemei marked this pull request as ready for review February 2, 2026 06:11
@cecemei cecemei changed the title wip Add compaction eligibility concept and incremental compaction support Feb 2, 2026
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @cecemei !

I like the idea of separating the CompactionStatus (i.e. the current degree of compaction of an interval) from the Eligibility (i.e. whether an interval should be picked for compaction or not).

I have left some suggestions to aid with the separation. Let me know if they make sense.

FULL_COMPACTION,
INCREMENTAL_COMPACTION,
NOT_ELIGIBLE,
NOT_APPLICABLE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is "not applicable" different from "not eligible"?
I think eligibility should have only 3 possible states - not eligible, incremental and full.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are tracked differently in AutoCompactionSnapshot, my original implementation only has three states and it failed: https://github.com/apache/druid/actions/runs/21509820801/job/61973864065. eventually i'd want evaluate() to only return eligibility and not status, so complete/pending status should have different eligibility

@JsonCreator
public ClientCompactionIntervalSpec(
@JsonProperty("interval") Interval interval,
@JsonProperty("uncompactedSegments") @Nullable List<SegmentDescriptor> uncompactedSegments,
Copy link
Contributor

@kfaraz kfaraz Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little confusing to add segments here since there is already a SpecificSegmentsSpec which specifies the set of segments that should be compacted.

I feel we should add a new type of CompactionInputSpec which will be used for incremental compaction only. It would have both a non-null interval as well as a non-empty set of segment IDs to compact incrementally. The CompactionTask would handle the different input specs accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpecificSegmentsSpec is not supported by msq, and it somewhat felt a bit deprecated to me, maybe because of the segment lock stuff. i kinda like to specify an interval since it gives some certainty, also wonder maybe we should check non-null for interval here? i didnt see any instance with null interval.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i kinda like to specify an interval since it gives some certainty

Yes, I agree, in the new scheme of things, it is always nice to specify the interval.
That's why I advise adding a new implementation of CompactionInputSpec which will be used only for incremental compaction so that it is easy to distinguish the two on the task side as well. We need not piggyback on the existing CompactionIntervalSpec.

@@ -64,18 +62,27 @@ Eligibility checkEligibilityForCompaction(
*/
class Eligibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this class is going to have a wider usage now, let's move it out into a separate file of its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done

* @param searchPolicy the policy used to determine compaction eligibility
* @return a CompactionCandidate with updated status and potentially filtered segments
*/
public CompactionCandidate evaluate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should not live here. The computation of the eligiblity should be done in CompactionStatusTracker.

The overall workflow would be like this:

  • identify a CompactionCandidate
  • determine its CompactionStatus
  • determine the eligiblity via CompactionStatusTracker.computeEligibility()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced offline, DataSourceCompactibleSegmentIterator would be responsible for filtering out ineligible candidates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

But this method should still not live here.
We should just have a withEligibility() method so that this class remains a bean for the most part.

This method can be moved to CompactionEligibility.compute() similar to CompactionStatus.compute().

*
* @return Pair of eligibility status and compaction status with reason for first failed check
*/
Pair<CompactionCandidateSearchPolicy.Eligibility, CompactionStatus> evaluate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the CompactionStatus and eligibility are going to be two separate things now, this class should not deal with eligibility at all. We can retain the old code in this class.

Once the CompactionStatus has been computed, that should be passed to CompactionStatusTracker.computeEligiblity() to determine the eligibility.

if (inputBytesCheck.isSkipped()) {
return inputBytesCheck;
final CompactionCandidateSearchPolicy.Eligibility inputBytesCheck = inputBytesAreWithinLimit();
if (inputBytesCheck != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the input bytes check should now be moved out of this class and used in CompactionStatusTracker.computeEligibility() instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cecemei , I think this class should remain untouched so that CompactionStatus can remain completely agnostic of CompactionEligibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this conversation is out of date after latest refactorings?

final CompactionCandidate candidates = CompactionCandidate.from(segments, config.getSegmentGranularity());
final CompactionStatus compactionStatus = CompactionStatus.compute(candidates, config, fingerprintMapper);
final CompactionCandidate candidatesWithStatus = candidates.withCurrentStatus(compactionStatus);
final CompactionCandidate candidatesWithStatus =
Copy link
Contributor

@kfaraz kfaraz Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this class should remain untouched since we are only computing the CompactionStatus here. The eligiblity computation should happen later in the CompactionJobQueue only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should it be jobQueue only? both jobQueue and not as a coordinator duty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinator-based compaction will be deprecated soon.
We want to support incremental compaction on Overlord-based compaction only so that it becomes an incentive for users to migrate to compaction supervisors. We followed the same approach with indexing fingerprints as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm still thinking about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, Cece. Changes I have been working on have also been only adding functionality to the supervisor. The IndexingState fingerprinting that compaction uses is only available for supervisors and rule based compaction/reindexing will also be supervisor only. Maybe it is time we officially do a separate PR and mark this duty as deprecated and update docs as such since so much is happening in this space these days

case FULL_COMPACTION:
clientCompactionIntervalSpec = new ClientCompactionIntervalSpec(entry.getCompactionInterval(), null, null);
break;
case INCREMENTAL_COMPACTION:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update this part only when incremental compaction has been implemented on the task side. Also, we should support incremental compaction with CompactionJobQueue (i.e. Overlord-based compaction) only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change since we need not support incremental compaction with Coordinator-based compaction duty.

@kfaraz
Copy link
Contributor

kfaraz commented Feb 4, 2026

Currently, CompactionStatus conflates two concerns:
Status: What's the task status of a CompactionCandidate?
Eligibility: Should we compact this interval now, and how?

@cecemei , just so that we are on the same page, the CompactionStatus does not exactly contain information of the compaction "task" (except for the RUNNING state, which we can now remove in this PR). It contains info of the current level of compaction in the interval, i.e. number of compacted segments, number of uncompacted segments and overall state (COMPLETE/PENDING/SKIPPED).

It would be nice to get rid of the SKIPPED state too since it is more of an eligibility thing, but I guess we can leave it for now.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, @cecemei !
I have left some follow-up suggestions to clean up the PR.

final CompactionCandidate candidates = CompactionCandidate.from(segments, config.getSegmentGranularity());
final CompactionStatus compactionStatus = CompactionStatus.compute(candidates, config, fingerprintMapper);
final CompactionCandidate candidatesWithStatus = candidates.withCurrentStatus(compactionStatus);
final CompactionCandidate candidatesWithStatus =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinator-based compaction will be deprecated soon.
We want to support incremental compaction on Overlord-based compaction only so that it becomes an incentive for users to migrate to compaction supervisors. We followed the same approach with indexing fingerprints as well.

* @param searchPolicy the policy used to determine compaction eligibility
* @return a CompactionCandidate with updated status and potentially filtered segments
*/
public CompactionCandidate evaluate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

But this method should still not live here.
We should just have a withEligibility() method so that this class remains a bean for the most part.

This method can be moved to CompactionEligibility.compute() similar to CompactionStatus.compute().

if (inputBytesCheck.isSkipped()) {
return inputBytesCheck;
final CompactionCandidateSearchPolicy.Eligibility inputBytesCheck = inputBytesAreWithinLimit();
if (inputBytesCheck != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cecemei , I think this class should remain untouched so that CompactionStatus can remain completely agnostic of CompactionEligibility.

/**
* Describes the eligibility of an interval for compaction.
*/
public class CompactionEligibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a method to this class:

public static CompactionEligibility compute(CompactionStatus status, CompactionCandidateSearchPolicy policy) {
   // determine the eligibility here
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a new evaluate static method

Comment on lines 334 to 336
final CompactionCandidate candidatesWithStatus =
CompactionCandidate.from(segments, config.getSegmentGranularity())
.evaluate(config, searchPolicy, fingerprintMapper);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should probably look more like the following:

Suggested change
final CompactionCandidate candidatesWithStatus =
CompactionCandidate.from(segments, config.getSegmentGranularity())
.evaluate(config, searchPolicy, fingerprintMapper);
final CompactionCandidate candidates = CompactionCandidate.from(segments, config.getSegmentGranularity());
final CompactionStatus compactionStatus = CompactionStatus.compute(candidates, config, fingerprintMapper);
final CompactionEligibility eligibility = CompactionEligibility.compute(compactionStatus, searchPolicy);
final CompactionCandidate candidatesWithStatus = candidates.withCurrentStatus(compactionStatus).withEligibility(eligibility);

);
}

public CompactionCandidate withPolicyEligibility(CompactionEligibility eligibility)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Thinking about this, I think it would be best to have a single method (e.g. withStatusAndEligibility()) that creates a copy of the CompactionCandidate with both the status and eligibility populated.
Otherwise, we would always create an intermediate copy unnecessarily.

CompactionCandidate candidate,
CompactionCandidateSearchPolicy searchPolicy
)
public CompactionStatus computeCompactionStatus(CompactionCandidate candidate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can now remove this method altogether, since we do not want to use the lastTaskStatus anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i understand, dont we still track pending/running/completed status?

@cecemei
Copy link
Contributor Author

cecemei commented Feb 6, 2026

Thanks for the PR, @cecemei !

I like the idea of separating the CompactionStatus (i.e. the current degree of compaction of an interval) from the Eligibility (i.e. whether an interval should be picked for compaction or not).

I have left some suggestions to aid with the separation. Let me know if they make sense.

I addressed most comments except for the coordinator based compaction and incremental compaction feature change.

The major change is i moved a lot of CompactionStatus stuff to CompactionEligibility, and update the CHECK to return String (null for previous CompactionStatus.COMPLETE, and non-null for previous CompactionStatus.pending...), it seems simpler since after all we're just looking for a reason to do compaction.

* <p>
* This method performs a two-stage evaluation:
* <ol>
* <li>First, uses {@link Evaluator} to check if the candidate needs compaction
Copy link
Contributor

@capistrant capistrant Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we make any changes to the Evaluator logic in moving it to this file or is it just lift and place in new location?

final CompactionCandidate candidates = CompactionCandidate.from(segments, config.getSegmentGranularity());
final CompactionStatus compactionStatus = CompactionStatus.compute(candidates, config, fingerprintMapper);
final CompactionCandidate candidatesWithStatus = candidates.withCurrentStatus(compactionStatus);
final CompactionCandidate candidatesWithStatus =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, Cece. Changes I have been working on have also been only adding functionality to the supervisor. The IndexingState fingerprinting that compaction uses is only available for supervisors and rule based compaction/reindexing will also be supervisor only. Maybe it is time we officially do a separate PR and mark this duty as deprecated and update docs as such since so much is happening in this space these days

@kfaraz
Copy link
Contributor

kfaraz commented Feb 6, 2026

Had a brief discussion with @cecemei on this. I think there has been some confusion regarding the roles of CompactionStatus, CompactionEligibility and the task status. It is in part due to the fact that CompactionStatus was originally serving all of these roles but not perfectly.

In short, we shouldn't need to make any major modification to any class in this PR except CompactionEligibility.

I have shared some details offline. We should be able to get this resolved by Monday.

public void onCompactionStatusComputed(
public void onSkippedCandidate(
CompactionCandidate candidateSegments,
DataSourceCompactionConfig config

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'config' is never used.
@cecemei cecemei changed the title Add compaction eligibility concept and incremental compaction support introduce clear separation between proposed compaction (what segments to compact), eligibility status (whether compaction is needed), and compaction mode (how to compact), along with improved task state tracking Feb 8, 2026
@cecemei cecemei changed the title introduce clear separation between proposed compaction (what segments to compact), eligibility status (whether compaction is needed), and compaction mode (how to compact), along with improved task state tracking introduce clear separation between proposed compaction, eligibility status, and compaction mode, along with improved task state tracking Feb 8, 2026
@cecemei
Copy link
Contributor Author

cecemei commented Feb 8, 2026

Hi @kfaraz @capistrant , I made some major changes to this pr, and hopefully it's in a much better state, please take a look when you're available, thanks! The incremental compaction related change has been removed from this pr, i'm going to make another one for that.

@cecemei cecemei mentioned this pull request Feb 9, 2026
10 tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some javadocs in this class would be helpful. A class level one + at least a brief sentence of each mode, even if they are a bit self explanatory


if (candidate.getCurrentStatus() != null) {
autoCompactionContext.put(COMPACTION_REASON_KEY, candidate.getCurrentStatus().getReason());
if (CompactionMode.NOT_APPLICABLE.equals(candidate.getMode())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be if (!CompactionMode.NOT_APPLICABLE.equals(candidate.getMode()))?

Seems to me like for compaction jobs the reason will be missing now, and for NOT_APPLICABLE a task wouldn't even be started anyways, as an exception would get thrown by compactSegments

@@ -346,8 +361,10 @@ static DimensionRangePartitionsSpec getEffectiveRangePartitionsSpec(DimensionRan
*/
private static class Evaluator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much I like moving away from a more descriptive return type for the individual checks. It feels like a nitpick since functionally everything is the same. And I guess I understand why you wanted to not be using CompactionStatus all over the place in the Evaluator like we were. My only alternative suggestion would be yet another simple record like object or something signifying a check with a result and reason (populated if the check failed). The "return null if check passed" just feels a little unintuitive. Maybe a compromise would be explicitly stating how the checks work in javadoc for the CHECKS and FINGERPRINT_CHECKS lists (or in the evalaute method javadoc)

Copy link
Contributor

@kfaraz kfaraz Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't like the null returns much myself.
We should update each CHECK to return a CheckResult object instead of a CompactionStatus.
The rest of this class should probably remain unchanged as it seems to be bloating up the PR without much value added.

if (inputBytesCheck.isSkipped()) {
return inputBytesCheck;
final CompactionCandidateSearchPolicy.Eligibility inputBytesCheck = inputBytesAreWithinLimit();
if (inputBytesCheck != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this conversation is out of date after latest refactorings?

Assert.assertEquals(expectedSegmentCountAwaitingCompaction, snapshot.getSegmentCountAwaitingCompaction());
Assert.assertEquals(expectedSegmentCountCompressed, snapshot.getSegmentCountCompacted());
Assert.assertEquals(expectedSegmentCountSkipped, snapshot.getSegmentCountSkipped());
Assert.assertEquals(new AutoCompactionSnapshot(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this make it less clear what is causing the difference than having all the individual assertions? your way looks better on the eyes, but I wonder if it makes it more annoying for a dev to identify what exactly they broke if it starts failing?

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cecemei , I feel a lot of changes here are really not needed and probably make the definitions of various compaction entities confusing.

  • The only change that is really needed here is adding a CompactionMode enum to Eligibility.
  • The existing terminology CompactionCandidate, CompactionStatus, Eligibility, CompactionTaskStatus captures the respective purposes nicely and also provides a clear distinction. Let's stick with the existing terminology unless there is some real gap in the current nomenclature/functionality.
  • If you feel it needs some explanation, we may add/update javadocs.
  • You may either remove all the refactors from this PR (so that we can quickly unblock incremental compaction in #18996) OR update this PR to only make suggested modifications that do not break the existing definitions.

P.S. Existing definitions

Class Definition
CompactionCandidate A potential candidate for compaction, identified by an interval and some segment IDs
CompactionStatus Current status of a CompactionCandidate indicated by number of compacted/uncompacted segments/bytes in the interval.
Eligibility Given a CompactionCandidate and CompactionStatus, does the policy think it is eligible for compaction?
CompactionTaskStatus Status of past/ongoing compaction tasks for a CompactionCandidate

All of these are distinct from each other and should preferably remain so.
There is room for improvement in these classes, but we should do them in a manner that does not completely change the existing semantics.

final CompactionStatus compactionStatus = getCurrentStatusForJob(job, policy);
switch (compactionStatus.getState()) {
case RUNNING:
final CompactionCandidate.TaskState candidateState = getCurrentTaskStateForJob(job);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not add a new task state. Use the existing CompactionTaskStatus class.

return createCandidate(proposedCompaction, eligibility, null);
}

public CompactionCandidate createCandidate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't really have this method here. The CompactionMode is an attribute that we associate to a CompactionCandidate. One should not be responsible for creating the other.

'}';
}
}
CompactionCandidate createCandidate(CompactionCandidate.ProposedCompaction candidate, CompactionStatus eligibility);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense for the policy to be creating candidates since the candidates are identified much earlier by the DatasourceCompactibleSegmentIterator itself. The policy can only check the eligibility of a candidate for compaction. (I suppose the name CompactionCandidateSearchPolicy might be misleading here, since it is really the CompactionCandidatePickAndPriorizePolicy but it can be a mouthful, so we can stick with what we have.)

We should retain the original method here. The caller (CompactionJobQueue) should just check the eligibility and then decide whether to launch a job or not. If yes, then whether full or incremental.

* Used by {@link CompactionStatusTracker#computeCompactionTaskState(CompactionCandidate)}.
* The callsite then determines whether to launch compaction task or not.
*/
public enum TaskState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not add a new TaskState, use CompactionTaskStatus instead. If it doesn't contain any info, let's add it there.

* Non-empty list of segments of a datasource being proposed for compaction.
* A proposed compaction typically contains all the segments of a single time chunk.
*/
public static class ProposedCompaction
Copy link
Contributor

@kfaraz kfaraz Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we need is a way to distinguish between a candidate before and after the status has been computed, so that we can avoid the null checks on currentStatus. I don't think this change fully achieves that.

Also, the name CompactionCandidate captures the intent better as it defines a potential "candidate" for compaction. So let's stick with it.

Instead of this change, you may consider adding a new class CompactionCandidateAndStatus which will be used by both the methods of the search policy. The new class will contain only a candidate field and a status field.

I don't think we will need this class to contain the Eligibility (or the CompactionMode) since it will be computed and used only by the CompactionJobQueue.

* @param eligibility initial eligibility from compaction config checks
* @return final compaction candidate
*/
class Eligibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's retain the Eligibility class since it answers the question:

"Does the search policy consider this CompactionCandidate eligible for compaction?"

The Eligibility class should now also contain an enum field for CompactionMode. The rest of the class can remain unchanged.

CompactionCandidate candidatesWithStatus = CompactionCandidate
.from(partialEternitySegments, null)
.withCurrentStatus(CompactionStatus.skipped("Segments have partial-eternity intervals"));
CompactionCandidate candidatesWithStatus =
Copy link
Contributor

@kfaraz kfaraz Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class probably doesn't need to change in this PR.
The Eligibility should be computed and used only by the CompactionJobQueue (and maybe CompactSegments for backward compatibility).

* This method assumes that the given candidate is eligible for compaction
* based on the current compaction config/supervisor of the datasource.
*/
public CompactionStatus computeCompactionStatus(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferable approach here is to completely remove this method and use the existing method getLatestTaskStatus().

@GWphua GWphua mentioned this pull request Feb 13, 2026
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants