Improve ConfigNode leader warm-up before serving#17821
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17821 +/- ##
============================================
+ Coverage 40.90% 40.99% +0.09%
- Complexity 2610 2611 +1
============================================
Files 5186 5188 +2
Lines 351388 352111 +723
Branches 44991 45086 +95
============================================
+ Hits 143722 144345 +623
- Misses 207666 207766 +100 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Caideyipi
left a comment
There was a problem hiding this comment.
I reviewed the warm-up changes on a57680d2542. I think there are a few issues that should be fixed before merge:
-
AINode treats the new warm-up status as a hard failure.
ConfigManager.registerAINode()now returnsCONFIG_NODE_LEADER_WARMING_UPwhileconfirmLeader()is warming up, but the Python AINode client only treatsREDIRECTION_RECOMMENDas retryable in_update_config_node_leader().node_register()/node_restart()then callverify_success()and raise on status1014, so an AINode can fail startup if it hits the leader during warm-up. Please add the new code to the AINode constants and retry handling paths. -
Non-seed ConfigNode registration has the same gap.
registerConfigNode()can now returnCONFIG_NODE_LEADER_WARMING_UP, butConfigNode.sendRegisterConfigNodeRequest()only retries success/redirection/internal-retry statuses and throwsStartupExceptionfor anything else. A ConfigNode joining during leader warm-up can fail immediately instead of waiting and retrying. -
The async leader-service startup has a stepdown race.
notifyLeaderReady()now submitsstartLeaderServicesAfterLoadReady()asynchronously. That task checksisLeaderReady()only once before starting leader-only services and settingleaderServicesReady=true. IfnotifyNotLeader()runs after that check but before/during service startup, the old task can re-enable services after cleanup. Please guard this with a leader epoch/cancellation token, and re-check before settingleaderServicesReady. -
The DataNode register retry budget is too tight for the 30s warm-up tolerance. On
CONFIG_NODE_LEADER_WARMING_UP,updateConfigNodeLeader()sleeps 2s and returns retryable, whileregisterDataNode()has 15 attempts. The final request can still happen before the 30s tolerance expires, then sleep and exit without one post-tolerance attempt. A deadline-based retry or a larger retry budget would avoid this edge case.
There was a problem hiding this comment.
Pull request overview
This PR improves ConfigNode leader “warm-up” semantics so DataNodes avoid premature redirection during leader transitions, and ConfigNode serving is gated on initial heartbeat sampling readiness.
Changes:
- Add a dedicated
CONFIG_NODE_LEADER_WARMING_UPstatus and have DataNodes wait/retry the current leader during warm-up. - Introduce
LoadManager.isLoadReady()and a 30s tolerance window to require first heartbeat coverage (ConfigNode/DataNode) before considering load services ready. - Track consensus-group heartbeat sampling coverage and add tests for warm-up readiness behavior.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/protocol/client/ConfigNodeClient.java | Treat CONFIG_NODE_LEADER_WARMING_UP as a wait-and-retry instead of redirection. |
| iotdb-core/confignode/src/test/java/org/apache/iotdb/confignode/manager/load/LoadManagerTest.java | Add tests validating load warm-up readiness criteria and 30s tolerance behavior. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/LoadManager.java | Add load readiness state machine (isLoadReady, reason strings, tolerance window). |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/cache/LoadCache.java | Track consensus-group sampled nodes; add node-heartbeat unready reasons; cache “unreported” samples. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/cache/consensus/ConsensusGroupCache.java | Always update consensus stats from last sample (including “unready leader”). |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/cache/AbstractLoadCache.java | Add hasHeartbeatSample() helper. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/consensus/ConsensusManager.java | Gate leader confirmation on consensus-ready + leader-services-ready + load-ready; return warming-up status. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/consensus/statemachine/ConfigRegionStateMachine.java | Track leaderServicesReady and start load services before leader services. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/client/async/handlers/heartbeat/DataNodeHeartbeatHandler.java | Improve null-safety and cache consensus/region samples; add missing-region sampling on partial reports. |
| iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/client/async/handlers/heartbeat/ConfigNodeHeartbeatHandler.java | On error, force-update node cache to Unknown (no connection-broken check). |
| iotdb-client/service-rpc/src/main/java/org/apache/iotdb/rpc/TSStatusCode.java | Add CONFIG_NODE_LEADER_WARMING_UP(1014). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| boolean shouldCacheConsensusSample = | ||
| (TConsensusGroupType.SchemaRegion.equals(regionGroupId.getType()) | ||
| && SCHEMA_REGION_SHOULD_CACHE_CONSENSUS_SAMPLE) | ||
| || (TConsensusGroupType.DataRegion.equals(regionGroupId.getType()) | ||
| && DATA_REGION_SHOULD_CACHE_CONSENSUS_SAMPLE); | ||
| long logicalTimestamp = | ||
| heartbeatResp.isSetConsensusLogicalTimeMap() | ||
| && heartbeatResp.getConsensusLogicalTimeMap().containsKey(regionGroupId) | ||
| ? heartbeatResp.getConsensusLogicalTimeMap().get(regionGroupId) | ||
| : heartbeatResp.getHeartbeatTimestamp(); | ||
| loadManager | ||
| .getLoadCache() | ||
| .cacheConsensusGroupHeartbeatSample( | ||
| regionGroupId, | ||
| nodeId, | ||
| Boolean.TRUE.equals(isLeader), | ||
| logicalTimestamp, | ||
| shouldCacheConsensusSample); |
| if (cacheLeader && isLeader) { | ||
| cacheConsensusSample( | ||
| regionGroupId, new ConsensusGroupHeartbeatSample(logicalTimestamp, nodeId)); | ||
| } else if (isConsensusGroupHeartbeatFullySampled(regionGroupId) | ||
| && !Optional.ofNullable(consensusGroupCacheMap.get(regionGroupId)) | ||
| .map(AbstractLoadCache::hasHeartbeatSample) | ||
| .orElse(false)) { | ||
| cacheConsensusSample( | ||
| regionGroupId, | ||
| new ConsensusGroupHeartbeatSample( | ||
| logicalTimestamp, ConsensusGroupCache.UN_READY_LEADER_ID)); | ||
| } |
|



Summary
Tests