feat(failover): Implement Graceful Failover Feature#3295
feat(failover): Implement Graceful Failover Feature#3295yuzegao wants to merge 3 commits intoapache:unstablefrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements a graceful failover mechanism for Kvrocks cluster, enabling controlled master-to-slave role transitions while maintaining data consistency. The implementation follows a state machine pattern with dedicated background thread execution, similar to the existing SlotMigrator module architecture.
Key Changes:
- Introduces a new
ClusterFailovermodule with an 8-state state machine for controlled failover execution - Implements write blocking during critical failover phases (
pause_write,wait_sync,switching) to ensure data consistency - Adds
CLUSTERX FAILOVERandCLUSTERX TAKEOVERcommands for failover initiation and slave promotion
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
tests/gocase/go.mod |
Updated Go version directive (contains critical issue - see comments) |
tests/gocase/integration/failover/failover_test.go |
Comprehensive test suite with 20 test cases covering normal flow, failure scenarios, concurrency, write blocking, authentication, and data consistency |
tests/gocase/integration/failover/TEST_CASES.md |
Chinese-language test documentation describing test coverage goals |
src/cluster/cluster_failover.h |
Header defining ClusterFailover class with state machine, failover control methods, and thread management |
src/cluster/cluster_failover.cc |
Implementation of failover logic including slave validation, lag checking, sync waiting, and takeover command sending |
src/cluster/cluster.h |
Added methods for failover support: OnTakeOver(), GetNodeIPPort(), SetMySlotsMigrated(), IsSlotImported() |
src/cluster/cluster.cc |
Implements cluster-level failover operations, write blocking checks, failover state reset, and slot redirection |
src/commands/cmd_cluster.cc |
Command handlers for FAILOVER and TAKEOVER subcommands with timeout parsing |
src/server/server.h |
Declares ClusterFailover member and GetSlaveReplicationOffset() method |
src/server/server.cc |
Initializes ClusterFailover and implements slave offset retrieval for sync verification |
src/server/redis_connection.cc |
Allows writes to imported slots during failover (before topology update) |
src/storage/scripting.cc |
Allows Lua script writes to imported slots during failover |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (s_resp.GetValue().substr(0, 3) != "+OK") { | ||
| return {Status::NotOK, "TAKEOVER failed: " + s_resp.GetValue()}; | ||
| } |
There was a problem hiding this comment.
Potential resource leak: if SockReadLine fails or returns an unexpected response, the socket fd is closed. However, if GetValue().substr() throws an exception (e.g., if the response is less than 3 characters), the socket will not be closed. Consider using RAII or ensuring close(fd) is called in all paths, including exception paths.
There was a problem hiding this comment.
auto s_resp = util::SockReadLine(fd);
close(fd);
if (!s_resp.IsOK()) {
return {Status::NotOK, "Failed to read TAKEOVER response: " + s_resp.Msg()};
}
if (s_resp.GetValue().substr(0, 3) != "+OK") {
return {Status::NotOK, "TAKEOVER failed: " + s_resp.GetValue()};
}
The fd allways closed before any if condition.
| } | ||
|
|
||
| return {Status::RedisParseErr, | ||
| "CLUSTERX command, CLUSTERX VERSION|MYID|SETNODEID|SETNODES|SETSLOT|MIGRATE|FAILOVER"}; |
There was a problem hiding this comment.
The help message in the error does not include "TAKEOVER" which is a valid subcommand added in this PR. The message should be updated to include TAKEOVER in the list of valid subcommands.
| "CLUSTERX command, CLUSTERX VERSION|MYID|SETNODEID|SETNODES|SETSLOT|MIGRATE|FAILOVER"}; | |
| "CLUSTERX command, CLUSTERX VERSION|MYID|SETNODEID|SETNODES|SETSLOT|MIGRATE|FAILOVER|TAKEOVER"}; |
There was a problem hiding this comment.
The TAKEOVER is inner command, don't need to declare here.
| auto s_line = util::SockReadLine(fd); | ||
| if (!s_line.IsOK() || s_line.GetValue().substr(0, 3) != "+OK") { | ||
| close(fd); | ||
| return {Status::NotOK, "AUTH failed"}; | ||
| } |
There was a problem hiding this comment.
Potential resource leak: if SockReadLine fails or returns an unexpected response, the socket fd is closed. However, if GetValue().substr() throws an exception (e.g., if the response is less than 3 characters), the socket will not be closed. Consider using RAII or ensuring close(fd) is called in all paths, including exception paths.
There was a problem hiding this comment.
Without resource leak risk, beacuse all path have close(fd).
if (!pass.empty()) {
std::string auth_cmd = redis::ArrayOfBulkStrings({"AUTH", pass});
auto s_auth = util::SockSend(fd, auth_cmd);
if (!s_auth.IsOK()) {
close(fd);
return {Status::NotOK, "Failed to send AUTH: " + s_auth.Msg()};
}
auto s_line = util::SockReadLine(fd);
if (!s_line.IsOK() || s_line.GetValue().substr(0, 3) != "+OK") {
close(fd);
return {Status::NotOK, "AUTH failed"};
}
}
std::string cmd = redis::ArrayOfBulkStrings({"CLUSTERX", "TAKEOVER"});
auto s_send = util::SockSend(fd, cmd);
if (!s_send.IsOK()) {
close(fd);
return {Status::NotOK, "Failed to send TAKEOVER: " + s_send.Msg()};
}
auto s_resp = util::SockReadLine(fd);
close(fd);
git-hulk
left a comment
There was a problem hiding this comment.
@yuzegao, I'm not sure if you expected to use the migrated/imported slot to do the redirection while doing the failover. If yes, it's too tricky to do that. You can use the failover state to allow/disallow writing in the new/master node instead of mixing the migration behavior with the failover.
Another question is: What if the migration is ongoing?
Hi, @git-hulk , Thank you very much for your feedback. There is indeed a serious risk here. Please evaluate whether the following modifications are reasonable?
|
Summary
This PR implements graceful failover for Kvrocks cluster, allowing a master node to safely transfer control to a slave node while ensuring data consistency and minimizing service disruption.
Background
Based on GitHub Discussion #3218, this feature enables controlled master-to-slave failover with:
Implementation
Architecture
ClusterFailoverclass, parallel toSlotMigratornone→started→check_slave→pause_write→wait_sync→switching→success/failed)Key Features
pause_write,wait_sync, andswitchingstates (returnsTRYAGAIN)CLUSTERX TAKEOVERcommand to slave with authentication supportMOVEDerrorsCommands
CLUSTERX FAILOVER <slave-node-id> [timeout]- Initiate failover (default timeout: 1000ms)CLUSTER INFO- Now includescluster_failover_state:<state>Files Changed
New Files:
src/cluster/cluster_failover.h/cluster_failover.cc- Core implementation (325 lines)tests/gocase/integration/failover/failover_test.go- Test suite (926 lines)GRACEFUL_FAILOVER_DESIGN.md- Design documentModified Files:
src/server/server.{h,cc}- AddedClusterFailovermember andGetSlaveReplicationOffset()src/cluster/cluster.{h,cc}- Write blocking check,SetMySlotsMigrated(),OnTakeOver(), state resetsrc/commands/cmd_cluster.cc-FAILOVERandTAKEOVERcommand handlersTesting
Comprehensive test suite with 20 sub-test cases (100% pass rate):
Compatibility
✅ Backward compatible: New feature, no breaking changes. Only active when
cluster-enabled=yes. Existing clusters unaffected.