什么是哨兵?
哨兵是Redis的一种运行模式,它专注于对Redis实例(主节点、从节点)运行状态的监控,并能够在主节点发生故障时通过一系列的机制实现选主及主从切换,实现故障转移,确保整个Redis系统的可用性。
哨兵的作用
监控(Monitoring):持续监控Redis主节点、从节点是否处于预期的工作状态。 通知(Notification):哨兵可以把Redis实例的运行故障信息通过API通知监控系统或者其他应用程序。 自动故障恢复(Automatic failover):当主节点运行故障时,哨兵会启动自动故障恢复流程:某个从节点会升级为主节点,其他从节点会使用新的主节点进行主从复制,通知客户端使用新的主节点进行。 配置中心(Configuration provider):哨兵可以作为客户端服务发现的授权源,客户端连接到哨兵请求给定服务的Redis主节点地址。如果发生故障转移,哨兵会通知新的地址。这里要注意:哨兵并不是Redis代理,只是为客户端提供了Redis主从节点的地址信息。
哨兵如何知道从机的信息
哨兵它每隔一段时间会向主机发起INFO请求,主机收到后会把SLAVE的信息返回给哨兵,这样哨兵就可以知道从机的IP,从而进行主从切换。 另外,当哨兵有了从机的IP后,也会向从机发起INFO请求,从而获取其他信息。
- 如果INFO命令目标是从节点:哨兵从返回信息中获取从节点所属的最新主节点ip和port,如果与历史记录不一致,则执行更新;获取从节点的优先级、复制偏移量以及与主节点的链接状态并更新。
- 如果INFO命令目标是主节点:哨兵从返回信息中获取主节点的从机列表,如果从节点是新增的,则将其加入监控列表。
- 无论目标是主节点还是从节点,都会记录其runId。
- 如果节点的角色发生变化,哨兵会记录节点新的角色及上报时间。若此时哨兵运行在TILT模式下,则什么都不做。否则,会执行主从切换相关的逻辑,我们后面再细说。
开始进入搭建环节:
以一主二从三哨兵做实验
第一步:配置主从机
主机:
docker run
--name redis
-p 6379:6379
-v /home/redis/data:/data
-v /home/redis/conf/redis.conf:/etc/redis/redis.conf
-d redis:6.0
redis-server /etc/redis/redis.conf
拉去官网的redis.conf到对应的目录下并修改内容:
bind 0.0.0.0
从机:
docker run
--name redis-slave2
-p 6381:6379
-v /home/redis-slave2/data:/data
-v /home/redis-slave2/conf/redis.conf:/etc/redis/redis.conf
-d redis:6.0 /etc/redis/redis.conf
docker run
--name redis-slave1
-p 6380:6379
-v /home/redis-slave1/data:/data
-v /home/redis-slave1/conf/redis.conf:/etc/redis/redis.conf
-d redis:6.0 /etc/redis/redis.conf
两个从机的redis.conf配置文件,修改内容:
replicaof master的ip master的端口号
bind 0.0.0.0
如果不知道master的ip 可以使用该命令查询: docker inspect 容器名
最终效果:
[root@VM-8-12-centos conf]# docker exec -it redis /bin/bash
root@111:/data# redis-cli
127.0.0.1:6379> info replication
可以看到以下信息,有两从机
role:master
connected_slaves:2
slave0:ip=172.17.0.5,port=6379,state=online,offset=101706,lag=0
slave1:ip=172.17.0.6,port=6379,state=online,offset=101706,lag=1
第二步:配置三哨兵
执行命令创建哨兵
docker run --name redis-sentinel-16379
-p 16379:16379
--restart=always
-v /home/redis-sentinel-16379/data:/data
-v /home/redis-sentinel-16379/conf/sentinel.conf:/etc/sentinel.conf -d redis:6.0 redis-sentinel /etc/sentinel.conf
在/home/redis-sentinel-16379/conf/ 路径下创建 sentinel.conf,内容如下:
sentinel monitor masters79 172.17.0.4 6379 2
port 16379 #哨兵端口号 16379 16380 16381
logfile "sentinel.log"
解释:sentinel monitor (被监控的名称) host port [number] number 指的是主机宕机后,多少台哨兵认为挂了才去进行主机迁移,一般为 n台主从机/2 + 1
然后根据上面的步骤 再创建两台redis-sentinel-16380、redis-sentinel-16381
docker run --name redis-sentinel-16380 -p 16380:16380 --restart=always -v /home/redis-sentinel-16380/data:/data -v /home/redis-sentinel-16380/conf/sentinel.conf:/etc/sentinel.conf -d redis:6.0 redis-sentinel /etc/sentinel.conf
docker run --name redis-sentinel-16381 -p 16381:16381 --restart=always -v /home/redis-sentinel-16381/data:/data -v /home/redis-sentinel-16381/conf/sentinel.conf:/etc/sentinel.conf -d redis:6.0 redis-sentinel /etc/sentinel.conf
随便进入一台哨兵,查询监控情况:
[root@VM-8-12-centos conf]# docker exec -it redis-sentinel-16379 /bin/bash
root@24ff99a7736f:/data# redis-cli -p 16379
127.0.0.1:16379> info
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=sentinel79,status=ok,address=172.17.0.4:6379,slaves=2,sentinels=1
可以查询监控日志,vim sentinel.log
1:X 19 Apr 2022 16:06:14.127 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 19 Apr 2022 16:06:14.127 # Redis version=6.0.16, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 19 Apr 2022 16:06:14.127 # Configuration loaded
1:X 19 Apr 2022 16:06:14.128 * Running mode=sentinel, port=16381.
1:X 19 Apr 2022 16:06:14.129 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 19 Apr 2022 16:06:14.135 # Could not rename tmp config file (Device or resource busy)
1:X 19 Apr 2022 16:06:14.135 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
1:X 19 Apr 2022 16:06:14.135 # Sentinel ID is 68879b62a5565a7ce44d035c8c9e4c91705e935f
1:X 19 Apr 2022 16:06:14.135 # +monitor master masters79 172.17.0.4 6379 quorum 2
1:X 19 Apr 2022 16:06:14.136 * +slave slave 172.17.0.5:6379 172.17.0.5 6379 @ masters79 172.17.0.4 6379
1:X 19 Apr 2022 16:06:14.139 # Could not rename tmp config file (Device or resource busy)
1:X 19 Apr 2022 16:06:14.139 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
1:X 19 Apr 2022 16:06:14.139 * +slave slave 172.17.0.6:6379 172.17.0.6 6379 @ masters79 172.17.0.4 6379
1:X 19 Apr 2022 16:06:14.142 # Could not rename tmp config file (Device or resource busy)
1:X 19 Apr 2022 16:06:14.142 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
1:X 19 Apr 2022 16:06:15.031 * +sentinel sentinel 112ccde5732ace56de4ebc2911dcac97604d710a 172.17.0.9 16380 @ masters79 172.17.0.4 6379
1:X 19 Apr 2022 16:06:15.035 # Could not rename tmp config file (Device or resource busy)
1:X 19 Apr 2022 16:06:15.035 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
第三步:验证
master宕机 127.0.0.1:6379> shutdown
# Replication
role:slave
master_host:172.17.0.6
master_port:6379
master_link_status:up
由此可见,172.17.0.6 从 slave 转到了 master, 另外当原master(即172.17.0.4)启动后,它不再是master 而是slave
哨兵日志:
1:X 19 Apr 2022 16:13:36.511 # +sdown master masters79 172.17.0.4 6379
1:X 19 Apr 2022 16:13:36.663 # Could not rename tmp config file (Device or resource busy)
1:X 19 Apr 2022 16:13:36.663 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
1:X 19 Apr 2022 16:13:36.663 # +new-epoch 1
1:X 19 Apr 2022 16:13:36.665 # Could not rename tmp config file (Device or resource busy)
1:X 19 Apr 2022 16:13:36.665 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
1:X 19 Apr 2022 16:13:36.665 # +vote-for-leader 68879b62a5565a7ce44d035c8c9e4c91705e935f 1
1:X 19 Apr 2022 16:13:37.629 # +odown master masters79 172.17.0.4 6379 #quorum 2/2
1:X 19 Apr 2022 16:13:37.629 # Next failover delay: I will not start a failover before Tue Apr 19 16:19:37 2022
1:X 19 Apr 2022 16:13:37.670 # +config-update-from sentinel 68879b62a5565a7ce44d035c8c9e4c91705e935f 172.17.0.10 16381 @ masters79 172.17.0.4 6379
1:X 19 Apr 2022 16:13:37.670 # +switch-master masters79 172.17.0.4 6379 172.17.0.6 6379
1:X 19 Apr 2022 16:13:37.670 * +slave slave 172.17.0.5:6379 172.17.0.5 6379 @ masters79 172.17.0.6 6379
1:X 19 Apr 2022 16:13:37.670 * +slave slave 172.17.0.4:6379 172.17.0.4 6379 @ masters79 172.17.0.6 6379
1:X 19 Apr 2022 16:13:37.673 # Could not rename tmp config file (Device or resource busy)
1:X 19 Apr 2022 16:13:37.673 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
1:X 19 Apr 2022 16:14:07.696 # +sdown slave 172.17.0.4:6379 172.17.0.4 6379 @ masters79 172.17.0.6 6379
|