概述
- 哨兵Sentinel:主从故障切换的具体实现
- 分布式
- 也是Redis服务器,不提供数据服务
- 一般为单数
三个阶段
- 发现问题
- 竞选负责人
- 优选新master
- 新master上任,其他slave切换master,原master作为slave故障回复后连接
监控阶段
用于同步所有节点-Sentinels的状态信息,并获取master、slave的信息
- 获取各个sentinel的状态(是否在线),新的sentinel上线会刷新所有sentinel的SentinelState:sentinels
- 获取master的状态
- 获取所有slave的状态(根据master中的slave信息)
- runid
- role:slave
- master_host、master_port
- offset
……
通知阶段
- sentinel监控master、slave状态
- sentinel之间同步
- publish/subscribe
故障转移阶段
1. 监控master down
- 单台发现master down,改变SentinelRedisInstance:master:flags :SRI_S_DOWN,称为主观下线
- 其他sentinel继续发hello询问master状态,如果超过半数发现master down,改变SentinelRedisInstance:master:flags :SRI_O_DOWN,称为客观下线
2.选取sentinel执行者
- 投票
- 每个sentinel广播发送ip、port、选举次数和runid
- 接收者投票给最先收到的
- 可以多轮,直到选出
sentinel执行者挑选备选master
- 在线的
- 响应慢的
- 与原master断开时间久的
- 优先原则
- 向新的master发送slaveof no one
- 向其他slave发送slaveof 新masterIP端口
Lab
环境设置
Role | 参数 | 端口 |
---|
master | master:6380 | 6380 | slave 1 | slave1:6379 | 6379 | slave 2 | slave2:6379 | 6381 | sentinel 1 | sentinel01:26379 | 26379 | sentinel 2 | sentinel02:26380 | 26380 | sentinel 3 | sentinel03:26381 | 26381 |
配置conf
master
port 6380 daemonize no dir /root/redis-6.0.6/data
slave
daemonize no dir /root/redis-6.0.6/data #logfile “6380.log” slaveof 127.0.0.1 6380
sentinel
- 配置的参考:解压文件sentinel.conf
- 可以用cat sentinel.conf |grep -v “#” | grep -v “^$” > sentinel-26379.conf 弃掉无用信息
port 26379 daemonize no pidfile “/var/run/redis-sentinel.pid” #logfile “” dir “/root/redis-6.0.6/data” sentinel down-after-milliseconds mymaster 30000 sentinel failover-timeout mymaster 180000 sentinel deny-scripts-reconfig yes sentinel monitor mymaster 127.0.0.1 6380 2 #最后的2表示2个sentinel判断master down即可客观下线
启动Redis
master
slave
启动sentinel
当sentinel启动完成,会更新conf的内容
port 26379 daemonize no pidfile “/var/run/redis-sentinel.pid” logfile “” dir “/tmp” sentinel myid 522d51d7ebd6c58a881da63226e8bc1b16f0917e sentinel deny-scripts-reconfig yes sentinel monitor mymaster 127.0.0.1 6379 2 sentinel config-epoch mymaster 1 sentinel leader-epoch mymaster 1 #Generated by CONFIG REWRITE protected-mode no user default on nopass ~* +@all sentinel known-replica mymaster 127.0.0.1 6380 sentinel known-replica mymaster 127.0.0.1 6381 sentinel known-sentinel mymaster 127.0.0.1 26380 16218e9672c6712459abb8ca9c5b221d50efc713 sentinel known-sentinel mymaster 127.0.0.1 26381 9a9f3d1ee0289eb65bd72f775b13b5ad3fcad7bd
如红色部分,配置增加了slave和其他sentinel的信息
在sentinel 01上看到的系统info sentinel
master–>down
从sentinel看到:
32108:X 21 Nov 2021 05:52:45.808 # +sdown master mymaster 127.0.0.1 6380#主观下线 32108:X 21 Nov 2021 05:52:45.893 # +new-epoch 1 32108:X 21 Nov 2021 05:52:45.893 # +vote-for-leader 16218e9672c6712459abb8ca9c5b221d50efc713 1 32108:X 21 Nov 2021 05:52:46.913 # +odown master mymaster 127.0.0.1 6380 #quorum 3/2 #客观下线 32108:X 21 Nov 2021 05:52:46.913 # Next failover delay: I will not start a failover before Sun Nov 21 05:58:46 2021 32108:X 21 Nov 2021 05:52:47.173 # +config-update-from sentinel 16218e9672c6712459abb8ca9c5b221d50efc713 127.0.0.1 26380 @ mymaster 127.0.0.1 6380 32108:X 21 Nov 2021 05:52:47.173 # +switch-master mymaster 127.0.0.1 6380 127.0.0.1 6379#6379被选为新的master 32108:X 21 Nov 2021 05:52:47.173 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 32108:X 21 Nov 2021 05:52:47.173 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 32108:X 21 Nov 2021 05:53:17.196 # +sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
状态改变完成:
- 新的master-6379
- 6380/6381为slave注册到6379
6380 down–>running
sentinel将6380的状态改变 在6380 cli client上面看到:
#Replication role:slave master_host:127.0.0.1 master_port:6379 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0 slave_repl_offset:20931 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:3317b9be56ad63dcff8a416e9c485b477b1bf176 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:20931 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:18767 repl_backlog_histlen:2165
其变成了slave
|