Skip to content

Add support for handling scenarios where end time is invalid during RetentionManager run#18148

Open
9aman wants to merge 4 commits intoapache:masterfrom
9aman:retention_manager_improvement_in_case_of_missing_start_end_time
Open

Add support for handling scenarios where end time is invalid during RetentionManager run#18148
9aman wants to merge 4 commits intoapache:masterfrom
9aman:retention_manager_improvement_in_case_of_missing_start_end_time

Conversation

@9aman
Copy link
Copy Markdown
Contributor

@9aman 9aman commented Apr 9, 2026

Summary

  • When segment end time is invalid, the RetentionManager currently skips the segment entirely — it is never deleted regardless of the retention policy. This adds an optional fallback to use segmentZKMetadata.getCreationTime() instead, so segments with missing/invalid end times can still be cleaned up.
  • Gated behind cluster config controller.retentionManager.enableCreationTimeFallback (default false) — no behavior change unless explicitly opted in.
  • Supports dynamic config updates via the existing cluster config change listener — no controller restart needed.

Test plan

  • TimeRetentionStrategyTest#testCreationTimeFallback — unit tests covering: fallback disabled (existing behavior preserved), fallback enabled with valid/recent/invalid/zero creation time, valid end time takes priority over fallback
  • RetentionManagerTest#testCreationTimeFallbackOnChange — verifies dynamic config toggle via onChange()
  • RetentionManagerTest#testRetentionWithInvalidEndTimeAndCreationTimeFallback — end-to-end: segment with invalid end time is deleted when fallback is enabled and creation time exceeds retention

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 50.61728% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.47%. Comparing base (e4a18a0) to head (bba038a).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...troller/helix/core/retention/RetentionManager.java 33.33% 38 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18148      +/-   ##
============================================
+ Coverage     63.44%   63.47%   +0.03%     
  Complexity     1627     1627              
============================================
  Files          3244     3244              
  Lines        197250   197320      +70     
  Branches      30514    30531      +17     
============================================
+ Hits         125136   125254     +118     
+ Misses        62082    62035      -47     
+ Partials      10032    10031       -1     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.44% <50.61%> (+0.04%) ⬆️
java-21 63.41% <50.61%> (-0.01%) ⬇️
temurin 63.47% <50.61%> (+0.03%) ⬆️
unittests 63.47% <50.61%> (+0.03%) ⬆️
unittests1 55.41% <ø> (+<0.01%) ⬆️
unittests2 35.01% <50.61%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@noob-se7en noob-se7en added configuration Config changes (addition/deletion/change in behavior) enhancement Improvement to existing functionality documentation Improvements or additions to documentation labels Apr 10, 2026
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few high-signal issues; see inline comments.

boolean oldValue = _useCreationTimeFallbackForRetention;

// Validate that the value is a proper boolean string
if (!"true".equalsIgnoreCase(newValue) && !"false".equalsIgnoreCase(newValue)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this cluster config key is deleted, changedConfigs will still contain it but clusterConfigs.get(...) will be null (DefaultClusterConfigChangeHandler explicitly reports deleted keys that way). This branch treats null as invalid and keeps the old value, so removing the override never reverts to the default false until restart. Because this flag gates destructive retention deletion, the current leader can keep purging segments after an operator thinks they disabled the feature. Please handle null explicitly and reset _useCreationTimeFallbackForRetention to the default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Made changes for other config for this class as well.
Thanks for pointing this out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ @xiangfu0, is this okay. If so, I will merge.

Comment on lines +633 to +634
// Validate that the value is a proper boolean string
if (!"true".equalsIgnoreCase(newValue) && !"false".equalsIgnoreCase(newValue)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lets extract this into a util method to be used where this is duplicated here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration Config changes (addition/deletion/change in behavior) documentation Improvements or additions to documentation enhancement Improvement to existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants