建立Redis集群，以验证在故障转移时的副本选取行为

2 年 ago

文, 翔

15 minutes

首先

由于不太理解当Redis Cluster发生故障转移时如何选择下一个主节点副本，因此我在阅读《实践Redis入门》时，尝试在本地运行并确认其行为。

由于我自己对许多事情仍然理解有限，所以如果有地方解释错误，请您批评指正，我将不胜感激?‍♂️

构建 Redis 集群

启动容器

可以使用docker-compose。
在conf文件中，只指定了调试命令和使用集群模式。

version: '3'

services:
  node:
    image: redis:latest
    ports:
      - 6379
    volumes:
      - $PWD/redis.conf:/usr/local/etc/redis/redis.conf
    command: redis-server /usr/local/etc/redis/redis.conf
    networks:
      - redis_network

networks:
  redis_network:

enable-debug-command yes
cluster-enabled yes

我将启动一个容器。
这次我们选择以master3、replica3的配置来启动6个容器，所以我使用了–scale选项。
我想实时监控日志，所以我选择以附加模式启动，后续的处理将在另一个终端中进行。

$ docker-compose up --scale node=6

ログ
Creating network “test-redis-cluster_redis_network” with the default driver
Creating test-redis-cluster_node_1 … done
Creating test-redis-cluster_node_2 … done
Creating test-redis-cluster_node_3 … done
Creating test-redis-cluster_node_4 … done
Creating test-redis-cluster_node_5 … done
Creating test-redis-cluster_node_6 … done
Attaching to test-redis-cluster_node_6, test-redis-cluster_node_5, test-redis-cluster_node_3, test-redis-cluster_node_4, test-redis-cluster_node_1, test-redis-cluster_node_2
node_1 | 1:C 06 Sep 2023 08:37:17.267 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
node_1 | 1:C 06 Sep 2023 08:37:17.267 # Redis version=7.0.11, bits=64, commit=00000000, modified=0, pid=1, just started
node_1 | 1:C 06 Sep 2023 08:37:17.267 # Configuration loaded
node_3 | 1:C 06 Sep 2023 08:37:17.235 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
node_3 | 1:C 06 Sep 2023 08:37:17.235 # Redis version=7.0.11, bits=64, commit=00000000, modified=0, pid=1, just started
node_3 | 1:C 06 Sep 2023 08:37:17.235 # Configuration loaded
node_3 | 1:M 06 Sep 2023 08:37:17.237 * monotonic clock: POSIX clock_gettime
node_1 | 1:M 06 Sep 2023 08:37:17.269 * monotonic clock: POSIX clock_gettime
node_1 | 1:M 06 Sep 2023 08:37:17.269 * No cluster configuration found, I’m b969fd526f09d3ca5d1c9d6dbc7fd2d08ed21602
node_5 | 1:C 06 Sep 2023 08:37:17.243 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
node_5 | 1:C 06 Sep 2023 08:37:17.243 # Redis version=7.0.11, bits=64, commit=00000000, modified=0, pid=1, just started
node_5 | 1:C 06 Sep 2023 08:37:17.243 # Configuration loaded
node_5 | 1:M 06 Sep 2023 08:37:17.243 * monotonic clock: POSIX clock_gettime
node_5 | 1:M 06 Sep 2023 08:37:17.243 * No cluster configuration found, I’m cc97431d2343a3ca7a57e6d6b4da6d37bb569198
node_4 | 1:C 06 Sep 2023 08:37:17.275 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
node_1 | 1:M 06 Sep 2023 08:37:17.274 * Running mode=cluster, port=6379.
node_1 | 1:M 06 Sep 2023 08:37:17.274 # Server initialized
node_3 | 1:M 06 Sep 2023 08:37:17.239 * No cluster configuration found, I’m a6049751e651642f261095950d469a6b0cb8e611
node_3 | 1:M 06 Sep 2023 08:37:17.244 * Running mode=cluster, port=6379.
node_3 | 1:M 06 Sep 2023 08:37:17.244 # Server initialized
node_3 | 1:M 06 Sep 2023 08:37:17.246 * Ready to accept connections
node_6 | 1:C 06 Sep 2023 08:37:17.189 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
node_6 | 1:C 06 Sep 2023 08:37:17.189 # Redis version=7.0.11, bits=64, commit=00000000, modified=0, pid=1, just started
node_6 | 1:C 06 Sep 2023 08:37:17.189 # Configuration loaded
node_4 | 1:C 06 Sep 2023 08:37:17.275 # Redis version=7.0.11, bits=64, commit=00000000, modified=0, pid=1, just started
node_4 | 1:C 06 Sep 2023 08:37:17.275 # Configuration loaded
node_6 | 1:M 06 Sep 2023 08:37:17.190 * monotonic clock: POSIX clock_gettime
node_6 | 1:M 06 Sep 2023 08:37:17.191 * No cluster configuration found, I’m 8fe04f39df4e6b453a395da23f25bdc060559847
node_6 | 1:M 06 Sep 2023 08:37:17.195 * Running mode=cluster, port=6379.
node_4 | 1:M 06 Sep 2023 08:37:17.276 * monotonic clock: POSIX clock_gettime
node_4 | 1:M 06 Sep 2023 08:37:17.277 * No cluster configuration found, I’m 382223b947b401c45495a01c254e466630750c80
node_5 | 1:M 06 Sep 2023 08:37:17.247 * Running mode=cluster, port=6379.
node_5 | 1:M 06 Sep 2023 08:37:17.247 # Server initialized
node_6 | 1:M 06 Sep 2023 08:37:17.195 # Server initialized
node_6 | 1:M 06 Sep 2023 08:37:17.197 * Ready to accept connections
node_1 | 1:M 06 Sep 2023 08:37:17.277 * Ready to accept connections
node_5 | 1:M 06 Sep 2023 08:37:17.248 * Ready to accept connections
node_4 | 1:M 06 Sep 2023 08:37:17.279 * Running mode=cluster, port=6379.
node_4 | 1:M 06 Sep 2023 08:37:17.280 # Server initialized
node_4 | 1:M 06 Sep 2023 08:37:17.282 * Ready to accept connections
node_2 | 1:C 06 Sep 2023 08:37:17.283 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
node_2 | 1:C 06 Sep 2023 08:37:17.283 # Redis version=7.0.11, bits=64, commit=00000000, modified=0, pid=1, just started
node_2 | 1:C 06 Sep 2023 08:37:17.283 # Configuration loaded
node_2 | 1:M 06 Sep 2023 08:37:17.284 * monotonic clock: POSIX clock_gettime
node_2 | 1:M 06 Sep 2023 08:37:17.285 * No cluster configuration found, I’m bf59ab9a08f7c39bceaaec562acfc2ca90f84621
node_2 | 1:M 06 Sep 2023 08:37:17.288 * Running mode=cluster, port=6379.
node_2 | 1:M 06 Sep 2023 08:37:17.288 # Server initialized
node_2 | 1:M 06 Sep 2023 08:37:17.292 * Ready to accept connections

创建Redis集群

建立 Redis Cluster。
作为前处理，将每个节点的 IP 存入变量中。

$ NODES=`docker network inspect test-redis-cluster_redis_network | jq -r '.[0].Containers | .[].IPv4Address' | sed -e 's/\/16/:6379 /g' | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n//g'`

$ echo $NODES
172.27.0.7:6379 172.27.0.4:6379 172.27.0.5:6379 172.27.0.6:6379 172.27.0.2:6379 172.27.0.3:6379 

$ docker-compose exec node bash -c "redis-cli --cluster create ${NODES} --cluster-replicas 1"

ログ
>>> Performing hash slots allocation on 6 nodes…
Master[0] -> Slots 0 – 5460
Master[1] -> Slots 5461 – 10922
Master[2] -> Slots 10923 – 16383
Adding replica 172.27.0.2:6379 to 172.27.0.7:6379
Adding replica 172.27.0.3:6379 to 172.27.0.4:6379
Adding replica 172.27.0.6:6379 to 172.27.0.5:6379
M: b969fd526f09d3ca5d1c9d6dbc7fd2d08ed21602 172.27.0.7:6379
slots:[0-5460] (5461 slots) master
M: a6049751e651642f261095950d469a6b0cb8e611 172.27.0.4:6379
slots:[5461-10922] (5462 slots) master
M: 382223b947b401c45495a01c254e466630750c80 172.27.0.5:6379
slots:[10923-16383] (5461 slots) master
S: bf59ab9a08f7c39bceaaec562acfc2ca90f84621 172.27.0.6:6379
replicates 382223b947b401c45495a01c254e466630750c80
S: 8fe04f39df4e6b453a395da23f25bdc060559847 172.27.0.2:6379
replicates b969fd526f09d3ca5d1c9d6dbc7fd2d08ed21602
S: cc97431d2343a3ca7a57e6d6b4da6d37bb569198 172.27.0.3:6379
replicates a6049751e651642f261095950d469a6b0cb8e611
Can I set the above configuration? (type ‘yes’ to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join>>> Performing Cluster Check (using node 172.27.0.7:6379)
M: b969fd526f09d3ca5d1c9d6dbc7fd2d08ed21602 172.27.0.7:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
S: 8fe04f39df4e6b453a395da23f25bdc060559847 172.27.0.2:6379
slots: (0 slots) slave
replicates b969fd526f09d3ca5d1c9d6dbc7fd2d08ed21602
S: bf59ab9a08f7c39bceaaec562acfc2ca90f84621 172.27.0.6:6379
slots: (0 slots) slave
replicates 382223b947b401c45495a01c254e466630750c80
S: cc97431d2343a3ca7a57e6d6b4da6d37bb569198 172.27.0.3:6379
slots: (0 slots) slave
replicates a6049751e651642f261095950d469a6b0cb8e611
M: 382223b947b401c45495a01c254e466630750c80 172.27.0.5:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
M: a6049751e651642f261095950d469a6b0cb8e611 172.27.0.4:6379
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots…
>>> Check slots coverage…
[OK] All 16384 slots covered.

从这段中可以看出，node_1（172.27.0.7）作为主节点，node_6（172.27.0.2）成为了副本。

M: b969fd526f09d3ca5d1c9d6dbc7fd2d08ed21602 172.27.0.7:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 8fe04f39df4e6b453a395da23f25bdc060559847 172.27.0.2:6379
   slots: (0 slots) slave
   replicates b969fd526f09d3ca5d1c9d6dbc7fd2d08ed21602

在附加状态的终端中，会输出以下类型的日志记录。

ログ
node_1 | 1:M 06 Sep 2023 08:39:06.156 # configEpoch set to 1 via CLUSTER SET-CONFIG-EPOCH
node_3 | 1:M 06 Sep 2023 08:39:06.156 # configEpoch set to 2 via CLUSTER SET-CONFIG-EPOCH
node_4 | 1:M 06 Sep 2023 08:39:06.157 # configEpoch set to 3 via CLUSTER SET-CONFIG-EPOCH
node_2 | 1:M 06 Sep 2023 08:39:06.157 # configEpoch set to 4 via CLUSTER SET-CONFIG-EPOCH
node_6 | 1:M 06 Sep 2023 08:39:06.158 # configEpoch set to 5 via CLUSTER SET-CONFIG-EPOCH
node_5 | 1:M 06 Sep 2023 08:39:06.158 # configEpoch set to 6 via CLUSTER SET-CONFIG-EPOCH
node_1 | 1:M 06 Sep 2023 08:39:06.195 # IP address for this node updated to 172.27.0.7
node_6 | 1:M 06 Sep 2023 08:39:06.297 # IP address for this node updated to 172.27.0.2
node_2 | 1:M 06 Sep 2023 08:39:06.298 # IP address for this node updated to 172.27.0.6
node_3 | 1:M 06 Sep 2023 08:39:06.298 # IP address for this node updated to 172.27.0.4
node_5 | 1:M 06 Sep 2023 08:39:06.298 # IP address for this node updated to 172.27.0.3
node_4 | 1:M 06 Sep 2023 08:39:06.298 # IP address for this node updated to 172.27.0.5
node_2 | 1:S 06 Sep 2023 08:39:07.162 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
node_2 | 1:S 06 Sep 2023 08:39:07.162 * Connecting to MASTER 172.27.0.5:6379
node_2 | 1:S 06 Sep 2023 08:39:07.162 * MASTER <-> REPLICA sync started
node_2 | 1:S 06 Sep 2023 08:39:07.162 # Cluster state changed: ok
node_2 | 1:S 06 Sep 2023 08:39:07.162 * Non blocking connect for SYNC fired the event.
node_6 | 1:S 06 Sep 2023 08:39:07.163 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
node_2 | 1:S 06 Sep 2023 08:39:07.163 * Master replied to PING, replication can continue…
node_2 | 1:S 06 Sep 2023 08:39:07.163 * Trying a partial resynchronization (request 412d504d4a874b33efa641ca6da7b420db87c1dc:1).
node_6 | 1:S 06 Sep 2023 08:39:07.163 * Connecting to MASTER 172.27.0.7:6379
node_6 | 1:S 06 Sep 2023 08:39:07.163 * MASTER <-> REPLICA sync started
node_6 | 1:S 06 Sep 2023 08:39:07.163 # Cluster state changed: ok
node_6 | 1:S 06 Sep 2023 08:39:07.163 * Non blocking connect for SYNC fired the event.
node_5 | 1:S 06 Sep 2023 08:39:07.164 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
node_5 | 1:S 06 Sep 2023 08:39:07.164 * Connecting to MASTER 172.27.0.4:6379
node_5 | 1:S 06 Sep 2023 08:39:07.164 * MASTER <-> REPLICA sync started
node_5 | 1:S 06 Sep 2023 08:39:07.164 # Cluster state changed: ok
node_4 | 1:M 06 Sep 2023 08:39:07.164 * Replica 172.27.0.6:6379 asks for synchronization
node_6 | 1:S 06 Sep 2023 08:39:07.166 * Master replied to PING, replication can continue…
node_4 | 1:M 06 Sep 2023 08:39:07.164 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for ‘412d504d4a874b33efa641ca6da7b420db87c1dc’, my replication IDs are ‘a0a4beddfcaf11abea7b7b5a05c8f6e8ce426d48’ and ‘0000000000000000000000000000000000000000’)
node_4 | 1:M 06 Sep 2023 08:39:07.164 * Replication backlog created, my new replication IDs are ’40c42a17e28ff6e039c38f565c54f3ad4bb76881′ and ‘0000000000000000000000000000000000000000’
node_4 | 1:M 06 Sep 2023 08:39:07.164 * Delay next BGSAVE for diskless SYNC
node_5 | 1:S 06 Sep 2023 08:39:07.164 * Non blocking connect for SYNC fired the event.
node_1 | 1:M 06 Sep 2023 08:39:07.166 * Replica 172.27.0.2:6379 asks for synchronization
node_1 | 1:M 06 Sep 2023 08:39:07.166 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for ‘9df651b1729a067c6cb6ef216e93ac3cbd3b23a1’, my replication IDs are ‘f9f16c0a4046bff27d18a0b7a78e959715e44180’ and ‘0000000000000000000000000000000000000000’)
node_1 | 1:M 06 Sep 2023 08:39:07.166 * Replication backlog created, my new replication IDs are ‘1d3c4b9e83fe1771fed2b03e6ce385f12ec9b614’ and ‘0000000000000000000000000000000000000000’
node_1 | 1:M 06 Sep 2023 08:39:07.166 * Delay next BGSAVE for diskless SYNC
node_5 | 1:S 06 Sep 2023 08:39:07.166 * Master replied to PING, replication can continue…
node_6 | 1:S 06 Sep 2023 08:39:07.166 * Trying a partial resynchronization (request 9df651b1729a067c6cb6ef216e93ac3cbd3b23a1:1).
node_3 | 1:M 06 Sep 2023 08:39:07.167 * Replica 172.27.0.3:6379 asks for synchronization
node_3 | 1:M 06 Sep 2023 08:39:07.167 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for ‘a185e6b510870587a61eb2d9d3d262c2c68e6623’, my replication IDs are ‘b2728e79267eedddf20198120262edee523fcab7’ and ‘0000000000000000000000000000000000000000’)
node_3 | 1:M 06 Sep 2023 08:39:07.167 * Replication backlog created, my new replication IDs are ’50f8e3341e3ca47f153233cbc402cdb5cb31bf81′ and ‘0000000000000000000000000000000000000000’
node_3 | 1:M 06 Sep 2023 08:39:07.167 * Delay next BGSAVE for diskless SYNC
node_5 | 1:S 06 Sep 2023 08:39:07.166 * Trying a partial resynchronization (request a185e6b510870587a61eb2d9d3d262c2c68e6623:1).
node_4 | 1:M 06 Sep 2023 08:39:11.133 # Cluster state changed: ok
node_3 | 1:M 06 Sep 2023 08:39:11.133 # Cluster state changed: ok
node_1 | 1:M 06 Sep 2023 08:39:11.134 # Cluster state changed: ok
node_3 | 1:M 06 Sep 2023 08:39:12.042 * Starting BGSAVE for SYNC with target: replicas sockets
node_4 | 1:M 06 Sep 2023 08:39:12.042 * Starting BGSAVE for SYNC with target: replicas sockets
node_5 | 1:S 06 Sep 2023 08:39:12.042 * Full resync from master: 50f8e3341e3ca47f153233cbc402cdb5cb31bf81:14
node_1 | 1:M 06 Sep 2023 08:39:12.042 * Starting BGSAVE for SYNC with target: replicas sockets
node_2 | 1:S 06 Sep 2023 08:39:12.042 * Full resync from master: 40c42a17e28ff6e039c38f565c54f3ad4bb76881:14
node_6 | 1:S 06 Sep 2023 08:39:12.043 * Full resync from master: 1d3c4b9e83fe1771fed2b03e6ce385f12ec9b614:14
node_4 | 1:M 06 Sep 2023 08:39:12.045 * Background RDB transfer started by pid 21
node_1 | 1:M 06 Sep 2023 08:39:12.045 * Background RDB transfer started by pid 27
node_3 | 1:M 06 Sep 2023 08:39:12.045 * Background RDB transfer started by pid 21
node_2 | 1:S 06 Sep 2023 08:39:12.047 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
node_2 | 1:S 06 Sep 2023 08:39:12.047 * Discarding previously cached master state.
node_2 | 1:S 06 Sep 2023 08:39:12.047 * MASTER <-> REPLICA sync: Flushing old data
node_3 | 21:C 06 Sep 2023 08:39:12.047 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
node_6 | 1:S 06 Sep 2023 08:39:12.047 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
node_6 | 1:S 06 Sep 2023 08:39:12.047 * Discarding previously cached master state.
node_5 | 1:S 06 Sep 2023 08:39:12.047 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
node_1 | 27:C 06 Sep 2023 08:39:12.047 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
node_4 | 21:C 06 Sep 2023 08:39:12.047 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
node_4 | 1:M 06 Sep 2023 08:39:12.047 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
node_5 | 1:S 06 Sep 2023 08:39:12.048 * Discarding previously cached master state.
node_5 | 1:S 06 Sep 2023 08:39:12.048 * MASTER <-> REPLICA sync: Flushing old data
node_6 | 1:S 06 Sep 2023 08:39:12.047 * MASTER <-> REPLICA sync: Flushing old data
node_3 | 1:M 06 Sep 2023 08:39:12.047 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
node_2 | 1:S 06 Sep 2023 08:39:12.047 * MASTER <-> REPLICA sync: Loading DB in memory
node_6 | 1:S 06 Sep 2023 08:39:12.047 * MASTER <-> REPLICA sync: Loading DB in memory
node_1 | 1:M 06 Sep 2023 08:39:12.047 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
node_5 | 1:S 06 Sep 2023 08:39:12.048 * MASTER <-> REPLICA sync: Loading DB in memory
node_2 | 1:S 06 Sep 2023 08:39:12.052 * Loading RDB produced by version 7.0.11
node_2 | 1:S 06 Sep 2023 08:39:12.052 * RDB age 0 seconds
node_2 | 1:S 06 Sep 2023 08:39:12.052 * RDB memory usage when created 1.82 Mb
node_2 | 1:S 06 Sep 2023 08:39:12.052 * Done loading RDB, keys loaded: 0, keys expired: 0.
node_1 | 1:M 06 Sep 2023 08:39:12.052 * Background RDB transfer terminated with success
node_1 | 1:M 06 Sep 2023 08:39:12.052 * Streamed RDB transfer with replica 172.27.0.2:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
node_1 | 1:M 06 Sep 2023 08:39:12.052 * Synchronization with replica 172.27.0.2:6379 succeeded
node_6 | 1:S 06 Sep 2023 08:39:12.051 * Loading RDB produced by version 7.0.11
node_6 | 1:S 06 Sep 2023 08:39:12.051 * RDB age 0 seconds
node_6 | 1:S 06 Sep 2023 08:39:12.051 * RDB memory usage when created 1.78 Mb
node_6 | 1:S 06 Sep 2023 08:39:12.051 * Done loading RDB, keys loaded: 0, keys expired: 0.
node_6 | 1:S 06 Sep 2023 08:39:12.051 * MASTER <-> REPLICA sync: Finished with success
node_3 | 1:M 06 Sep 2023 08:39:12.053 * Background RDB transfer terminated with success
node_3 | 1:M 06 Sep 2023 08:39:12.053 * Streamed RDB transfer with replica 172.27.0.3:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
node_3 | 1:M 06 Sep 2023 08:39:12.053 * Synchronization with replica 172.27.0.3:6379 succeeded
node_2 | 1:S 06 Sep 2023 08:39:12.052 * MASTER <-> REPLICA sync: Finished with success
node_5 | 1:S 06 Sep 2023 08:39:12.053 * Loading RDB produced by version 7.0.11
node_5 | 1:S 06 Sep 2023 08:39:12.053 * RDB age 0 seconds
node_5 | 1:S 06 Sep 2023 08:39:12.053 * RDB memory usage when created 1.82 Mb
node_5 | 1:S 06 Sep 2023 08:39:12.053 * Done loading RDB, keys loaded: 0, keys expired: 0.
node_5 | 1:S 06 Sep 2023 08:39:12.053 * MASTER <-> REPLICA sync: Finished with success
node_4 | 1:M 06 Sep 2023 08:39:12.053 * Background RDB transfer terminated with success
node_4 | 1:M 06 Sep 2023 08:39:12.053 * Streamed RDB transfer with replica 172.27.0.6:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
node_4 | 1:M 06 Sep 2023 08:39:12.053 * Synchronization with replica 172.27.0.6:6379 succeeded

师傅与复制品的同步

有两种方法可以让复制品与主控进行同步，分别是完全同步和部分同步。

完全同步是指将主数据库的所有数据进行转储并传输，因此可以在任何时候执行，但需要注意网络带宽压力等性能方面的问题。

由于部分同步是在复制被断开的过程中仅同步增量数据，因此与完全同步相比，同步数据量较少，但如果复制回溯日志(主服务器的缓冲区)不包含副本的偏移量，那么无法进行同步(例如，在断开复制期间写入了超过回溯日志大小的数据等)。

查看附加状态的终端日志，可以看到node_6（副本）首先向node_1（主节点）发送了部分同步请求。

node_6  | 1:S 06 Sep 2023 08:39:07.163 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
node_6  | 1:S 06 Sep 2023 08:39:07.163 * Connecting to MASTER 172.27.0.7:6379
node_6  | 1:S 06 Sep 2023 08:39:07.163 * MASTER <-> REPLICA sync started
node_6  | 1:S 06 Sep 2023 08:39:07.163 # Cluster state changed: ok
node_6  | 1:S 06 Sep 2023 08:39:07.163 * Non blocking connect for SYNC fired the event.
node_6  | 1:S 06 Sep 2023 08:39:07.166 * Master replied to PING, replication can continue...
node_6  | 1:S 06 Sep 2023 08:39:07.166 * Trying a partial resynchronization (request 9df651b1729a067c6cb6ef216e93ac3cbd3b23a1:1).

然而，node_1（主节点）由于是全新创建的，自然在日志中不会有任何副本的偏移等内容，因此部分同步失败，将执行完全同步。

通过流RDB传输，副本172.27.0.2:6379成功(套接字)。可确定已完全同步成功(172.27.0.2是node_6的IP)。

node_1  | 1:M 06 Sep 2023 08:39:07.166 * Replica 172.27.0.2:6379 asks for synchronization
node_1  | 1:M 06 Sep 2023 08:39:07.166 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '9df651b1729a067c6cb6ef216e93ac3cbd3b23a1', my replication IDs are 'f9f16c0a4046bff27d18a0b7a78e959715e44180' and '0000000000000000000000000000000000000000')
node_1  | 1:M 06 Sep 2023 08:39:07.166 * Replication backlog created, my new replication IDs are '1d3c4b9e83fe1771fed2b03e6ce385f12ec9b614' and '0000000000000000000000000000000000000000'
node_1  | 1:M 06 Sep 2023 08:39:07.166 * Delay next BGSAVE for diskless SYNC
node_1  | 1:M 06 Sep 2023 08:39:12.042 * Starting BGSAVE for SYNC with target: replicas sockets
node_1  | 1:M 06 Sep 2023 08:39:12.045 * Background RDB transfer started by pid 27
node_1  | 27:C 06 Sep 2023 08:39:12.047 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
node_1  | 1:M 06 Sep 2023 08:39:12.047 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
node_1  | 1:M 06 Sep 2023 08:39:12.052 * Background RDB transfer terminated with success
node_1  | 1:M 06 Sep 2023 08:39:12.052 * Streamed RDB transfer with replica 172.27.0.2:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
node_1  | 1:M 06 Sep 2023 08:39:12.052 * Synchronization with replica 172.27.0.2:6379 succeeded

检查节点的状态

确认容器的ID。

$ docker ps
CONTAINER ID   IMAGE          COMMAND                  CREATED         STATUS         PORTS                     NAMES
c95717cf74af   redis:latest   "docker-entrypoint.s…"   3 minutes ago   Up 3 minutes   0.0.0.0:60434->6379/tcp   test-redis-cluster_node_2
c1facc6773b9   redis:latest   "docker-entrypoint.s…"   3 minutes ago   Up 3 minutes   0.0.0.0:60432->6379/tcp   test-redis-cluster_node_4
666bb02b9458   redis:latest   "docker-entrypoint.s…"   3 minutes ago   Up 3 minutes   0.0.0.0:60431->6379/tcp   test-redis-cluster_node_3
02897e83f47f   redis:latest   "docker-entrypoint.s…"   3 minutes ago   Up 3 minutes   0.0.0.0:60433->6379/tcp   test-redis-cluster_node_1
dfafdb0e2309   redis:latest   "docker-entrypoint.s…"   3 minutes ago   Up 3 minutes   0.0.0.0:60430->6379/tcp   test-redis-cluster_node_5
da59e140c108   redis:latest   "docker-entrypoint.s…"   3 minutes ago   Up 3 minutes   0.0.0.0:60429->6379/tcp   test-redis-cluster_node_6

每个节点的状态在启动时会记录在一个名为nodes.conf的文件中，通过查看这个文件可以确认主节点和从节点的关系（也可以使用 redis-cli命令的 CLUSTER NODES等方式进行确认）。

$ docker exec 02897e83f47f bash -c "cat nodes.conf"
8fe04f39df4e6b453a395da23f25bdc060559847 172.27.0.2:6379@16379 slave b969fd526f09d3ca5d1c9d6dbc7fd2d08ed21602 0 1693989548918 1 connected
bf59ab9a08f7c39bceaaec562acfc2ca90f84621 172.27.0.6:6379@16379 slave 382223b947b401c45495a01c254e466630750c80 0 1693989547000 3 connected
cc97431d2343a3ca7a57e6d6b4da6d37bb569198 172.27.0.3:6379@16379 slave a6049751e651642f261095950d469a6b0cb8e611 0 1693989547000 2 connected
382223b947b401c45495a01c254e466630750c80 172.27.0.5:6379@16379 master - 0 1693989547908 3 connected 10923-16383
a6049751e651642f261095950d469a6b0cb8e611 172.27.0.4:6379@16379 master - 0 1693989548000 2 connected 5461-10922
b969fd526f09d3ca5d1c9d6dbc7fd2d08ed21602 172.27.0.7:6379@16379 myself,master - 0 1693989546000 1 connected 0-5460
vars currentEpoch 6 lastVoteEpoch 0

每个参数按照以下顺序排列。

項目概要idノードIDip:port@cportノードのアドレスとポート。cportはクラスターバスポートと言い、ノード間でやりとりするためのポートで、デフォルトでは通常のポートに1万を足したものflagsノードの状態masterレプリカならmasterのノードID、masterなら-ping-sentアクティブなpingが送信されたUNIX時間pong-recv最後にpingの応答を受け取ったUNIX時間config-epochconfigEpochの値(後述)link-stateノード間クラスターバスでの接続状態slot割り当てられたハッシュスロットの範囲

我们可以看到，节点6（172.27.0.2）将节点1（172.27.0.7）作为主节点进行引用。
此外，我们还可以确认每个主节点被均匀分配了0至16383范围内的插槽。

故障转移

时代

在Redis Cluster中，使用称为epoch的值来对节点状态进行版本控制，并在共享最新信息的同时执行故障切换。
（这类似于分布式一致性算法Raft中的term的概念）
epoch有以下3种类型。

currentEpoch

クラスター全体の現在の状態を管理する
currentEpochがインクリメントされると、クラスターの状態が変化したことを表す
全てのノードがcurrentEpochに同意する必要があり、これにより一貫性を保つ

configEpoch

シャードごとに一意の数値が割り当てられる
新しい設定が適用されるとインクリメントされる
異なるノードが異なる構成を主張した際、コンフリクトを解消するために役立つ

lastVoteEpoch

レプリカから認証リクエストが来た際に用いる
マスターは自身のlastVoteEpochより古いcurrentEpochを持つレプリカからの認証リクエストは拒否する
レプリカに投票した後に更新される

将副本升级为主版本的步骤

当主处于离线状态时，我们将按照以下步骤选举出新的主节点。

副本在增加currentEpoch后，向其他主节点发送FALOVER_AUTH_REQUEST（认证请求），然后等待cluster-node-timeout * 2秒的时间进行投票。主节点收到FALOVER_AUTH_REQUEST后会检查发送源副本的currentEpoch。如果currentEpoch小于主节点的currentEpoch和lastVoteEpoch，则不接受投票。如果成功接受投票，则返回FAILOVER_AUTH_ACK（投票），并更新lastVoteEpoch为副本的currentEpoch。当副本接受多数主节点的投票后，将晋升为主节点，并将configEpoch更新为最新的currentEpoch。

在投票完成后，主节点将在 cluster-node-timeout * 2 秒的时间内不回应来自其他副本的认证请求，并且每个时期只能投票一次。因此，当与故障主节点相关联的多个副本存在时，先发送认证请求的副本将占有优势。

然而，当复制品确认主服务器进入“FAIL”状态后，不应立即发送认证请求，而是需要等待以下时间：
500毫秒 + 0~500毫秒（随机）+ 复制品等级 * 1000毫秒
各个参数的详细说明如下。

マスターがFAILになっているという情報がクラスター内に伝播されるのを待つための時間

0~500ms

1レプリカが同時にマスターとして選出されるのを避ける時間

REPLICA_RANK * 1000

REPLICA_RANKはレプリケーションオフセットが進んでいるレプリカ順に0から連番で振られる

简而言之，拥有最靠近主节点最新信息的偏移量（副本）能够更快地发送认证请求并且容易晋升。

停止主服务器并执行故障转移。

那么，让我们将节点1（主节点）关闭，观察故障转移的行为。

$ docker ps
CONTAINER ID   IMAGE                                 COMMAND                  CREATED         STATUS         PORTS                                                                                                                        NAMES
b7f1da8d42ff   redis:latest                          "docker-entrypoint.s…"   7 minutes ago   Up 7 minutes   0.0.0.0:55686->6379/tcp                                                                                                      test-redis-cluster_node_1
18ada60b2206   redis:latest                          "docker-entrypoint.s…"   7 minutes ago   Up 7 minutes   0.0.0.0:55687->6379/tcp                                                                                                      test-redis-cluster_node_2
...

$ docker exec -it b7f1da8d42ff bash
# redis-cli
> DEBUG SEGFAULT

我已经使节点1（主节点）下线。查看nodes.conf文件，可以确认节点1（172.27.0.7）已经变为失败状态，并且节点3（172.27.0.2）已经晋升为主节点。

$ docker exec da59e140c108 bash -c "cat nodes.conf"
b969fd526f09d3ca5d1c9d6dbc7fd2d08ed21602 172.27.0.7:6379@16379 master,fail - 1693993459028 1693993454000 1 connected
8fe04f39df4e6b453a395da23f25bdc060559847 172.27.0.2:6379@16379 myself,master - 0 1693993471000 7 connected 0-5460
bf59ab9a08f7c39bceaaec562acfc2ca90f84621 172.27.0.6:6379@16379 slave 382223b947b401c45495a01c254e466630750c80 0 1693993471116 3 connected
cc97431d2343a3ca7a57e6d6b4da6d37bb569198 172.27.0.3:6379@16379 slave a6049751e651642f261095950d469a6b0cb8e611 0 1693993470000 2 connected
382223b947b401c45495a01c254e466630750c80 172.27.0.5:6379@16379 master - 0 1693993471000 3 connected 10923-16383
a6049751e651642f261095950d469a6b0cb8e611 172.27.0.4:6379@16379 master - 0 1693993472125 2 connected 5461-10922
vars currentEpoch 7 lastVoteEpoch 0

我们停止了node_1。
node_6会在等待500毫秒 + 0~500毫秒（随机值） + REPLICA_RANK * 1000毫秒后，增加自己的currentEpoch，并将FAILOVER_AUTH_REQUEST发送给node_3和node_4。

node_3和node_4确认源复制的currentEpoch大于自己的currentEpoch和lastVoteEpoch之后，会将它们的currentEpoch更新为与源相同的值，然后向node_6发送FAILOVER_AUTH_ACK消息。

由于Node_6获得了半数以上的投票，因此它成功地将configEpoch更新为与currentEpoch相同的值，并成功晋升为主节点！

最后

使用docker-compose在本地搭建Redis Cluster，并通过图解了解了故障转移时副本的选举过程。
虽然意识到了epoch并且对Redis Cluster进行操作的机会可能很少，但如果您能感受到像这样通过这种过程晋升为主节点的感觉，那将是非常幸运的。

再见。