Skip to content

Conversation

@ZanCorDX
Copy link
Collaborator

πŸ“ Summary

Sleeps where repleaced by JoinHandles.

βœ… I have completed the following steps:

  • Run make lint
  • Run make test
  • Added tests (if applicable)

@claude
Copy link

claude bot commented Jan 22, 2026

Claude finished @ZanCorDX's task in 3m 28s β€”β€” View job


πŸ” PR Review: Better Shutdown

This PR improves the shutdown mechanism by replacing sleep-based waiting with JoinHandle-based task tracking. This is a significant improvement for graceful shutdown handling.

Summary

The PR introduces a two-phase shutdown mechanism:

  1. Graceful phase: Wait for all tasks (inner jobs + critical tasks) with a timeout (GRACEFUL_SHUTDOWN_TIMEOUT = 12s)
  2. Critical phase: If timeout reached, signal abort and wait only for critical tasks with a second timeout (CRITICAL_SHUTDOWN_TIMEOUT = 10s)

This replaces the previous approach that used arbitrary sleeps to wait for tasks to complete.


βœ… Strengths

  1. Proper task tracking: Using JoinHandles instead of sleeps ensures tasks actually complete rather than just hoping they finish within a sleep duration

  2. Two-phase shutdown: The separation between graceful and critical shutdown phases is a well-designed pattern for handling different priorities during shutdown

  3. Abort token: The new global_abort token allows critical tasks (like clickhouse) to know when upstream is hung and perform cleanup immediately

  4. FuturesUnordered: Efficient concurrent waiting on multiple handles using biased select for deterministic behavior

  5. Clear timeout constants: Well-documented constants in process_killer.rs explaining each timeout purpose

  6. Clean JoinHandle wrapping: The pattern in clickhouse/mod.rs of wrapping both inserter and backup handles in a single task is elegant


⚠️ Minor Issues Identified (see inline comments)

  1. Unused constant: RUN_SUBMIT_TO_RELAYS_JOB_CANCEL_TIME_SECONDS is now unused after removing RUN_SUBMIT_TO_RELAYS_JOB_CANCEL_TIME

  2. Log message clarity: The log at line 605 says "No critical tasks remaining" but logs non_critical_task_remaining, which is slightly confusing

  3. Timeout logging: Consider logging the count of remaining tasks when graceful timeout is reached to aid debugging


πŸ“‹ Checklist Verification

Item Status
Code compiles βœ… Appears correct
No security issues βœ… No concerns
Error handling βœ… Properly handles JoinHandle errors
Documentation βœ… Good doc comments on new structs/functions
Tests ⚠️ PR notes "Added tests (if applicable)" is unchecked - consider adding tests for the shutdown logic

🎯 Overall Assessment

Approve with minor suggestions. This is a well-designed improvement to the shutdown mechanism. The two-phase approach with proper task tracking is much more robust than the previous sleep-based approach. The inline comments contain minor suggestions that can be addressed in a follow-up if desired.

tokio::time::sleep(RUN_SUBMIT_TO_RELAYS_JOB_CANCEL_TIME).await;
task_manager.graceful_shutdown_with_timeout(Duration::from_secs(5));
abort_token.cancelled().await;
task_manager.graceful_shutdown();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good design pattern: This task forwards the abort signal to the task_manager, which is cleaner than the previous approach that used a hardcoded sleep. However, ensure that task_manager.graceful_shutdown() properly signals all child tasks to complete their cleanup work before the clickhouse_shutdown_handle completes.

let backup_handle = backup.spawn(task_executor, backup_table_name, tracing_target);

// Spawn a wrapper task that waits for both to complete
tokio::spawn(async move {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good pattern: Wrapping both handles in a single task that waits for both to complete is a clean solution. This ensures the returned JoinHandle only completes when both the inserter and backup tasks have finished their cleanup.

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
@claude
Copy link

claude bot commented Jan 22, 2026

Claude finished @ZanCorDX's task in 1m 41s β€”β€” View job


πŸ” PR Review: Better Shutdown (Re-review after sync)

Summary

Reviewed the latest sync which added logging of remaining tasks when graceful shutdown timeout is reached. Found one issue that should be addressed.


⚠️ Issue Found: Duplicate Warning Message

The latest commit introduced a duplicate warning message at crates/rbuilder/src/live_builder/mod.rs:577-582. The first warn! at line 577 is now redundant with the more informative version below it. See inline comment for suggested fix.


⚠️ Minor Issue from Previous Review

Log message at line 610: Says "No critical tasks remaining" but logs non_critical_task_remaining = inner_jobs.len(). The field name is slightly confusing since inner_jobs are the non-critical tasks that may still be running.


βœ… Overall Assessment

The PR is in good shape. The two-phase shutdown mechanism with proper task tracking is a significant improvement over the previous sleep-based approach. Just needs the duplicate warning removed.

}
}
_ = tokio::time::sleep_until(graceful_deadline) => {
warn!("Graceful shutdown timeout reached, signaling abort");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line appears to be redundant now that the more informative warning with task counts was added below. Consider removing it:

Suggested change
warn!("Graceful shutdown timeout reached, signaling abort");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant