简单明了搭建Redis哨兵模式集群v2
前言
早前写过一篇《简单明了搭建Redis哨兵模式》,最近复盘的时候发现方案可以改进,而且有不少需要注意的地方没有记录清楚,留了不少坑,不适合生产环境速查,所以 Remake 一篇。
本次将与时俱进使用最新的 Redis 7。
- Redis 镜像版本:redis:7.0.8-alpine3.17
- docker 版本:Docker version 20.10.3, build b455053
- docker-compose 版本:docker-compose version 1.28.5, build 324b023a
同样为了贯彻简单明了的原则,docker/docker-compose 的安装与镜像下载均不赘述。
计划
依旧是经典的 1 主 2 从 3 哨兵,
只使用一个 docker-compose.yml,
将集群部署在本人家里的 NAS 上,NAS 的访问地址为 nas.me
。
容器 | 容器端口 | 宿主机网络 |
---|---|---|
redis-master | 6379 | nas.me:6379 |
redis-slave-1 | 6379 | nas.me:6380 |
redis-slave-2 | 6379 | nas.me:6381 |
redis-sentinel-1 | 26379 | nas.me:26379 |
redis-sentinel-2 | 26379 | nas.me:26380 |
redis-sentinel-3 | 26379 | nas.me:26381 |
实施
文件目录
redis-sentinel
├── docker-compose.yml
├── master
│ └── data
├── sentinel1
│ └── conf
│ └── sentinel.conf
├── sentinel2
│ └── conf
│ └── sentinel.conf
├── sentinel3
│ └── conf
│ └── sentinel.conf
├── slave1
│ └── data
└── slave2
└── data
docker-compose.yml
核心文件。
此处的 Redis 实例并没有用映射 redis.conf
的方式启动,而是将主要参数直接添加到 command
中,这么做与将 --参数 写到配置文件里是等效的 。
version: '3'
services:
redis-master:
container_name: redis-master
image: docker.io/redis:7.0.8-alpine3.17
environment:
- TZ=Asia/Shanghai
restart: always
command: redis-server --requirepass root --masterauth root --replica-announce-ip "nas.me" --replica-announce-port 6379 --appendonly yes
ports:
- 6379:6379
volumes:
- ./master/data:/data
networks:
- redis_sentinel_network
redis-slave-1:
container_name: redis-slave-1
image: docker.io/redis:7.0.8-alpine3.17
environment:
- TZ=Asia/Shanghai
restart: always
command: redis-server --slaveof redis-master 6379 --requirepass root --masterauth root --replica-announce-ip "nas.me" --replica-announce-port 6380 --appendonly yes
ports:
- 6380:6379
volumes:
- ./master/data:/data
networks:
- redis_sentinel_network
depends_on:
- redis-master
redis-slave-2:
container_name: redis-slave-2
image: docker.io/redis:7.0.8-alpine3.17
environment:
- TZ=Asia/Shanghai
restart: always
command: redis-server --slaveof redis-master 6379 --requirepass root --masterauth root --replica-announce-ip "nas.me" --replica-announce-port 6381 --appendonly yes
ports:
- 6381:6379
volumes:
- ./master/data:/data
networks:
- redis_sentinel_network
depends_on:
- redis-slave-1
redis-sentinel-1:
container_name: redis-sentinel-1
image: docker.io/redis:7.0.8-alpine3.17
environment:
- TZ=Asia/Shanghai
restart: always
command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
ports:
- 26379:26379
volumes:
- ./sentinel1/conf:/usr/local/etc/redis/conf
networks:
- redis_sentinel_network
depends_on:
- redis-slave-2
redis-sentinel-2:
container_name: redis-sentinel-2
image: docker.io/redis:7.0.8-alpine3.17
environment:
- TZ=Asia/Shanghai
restart: always
command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
ports:
- 26380:26379
volumes:
- ./sentinel2/conf:/usr/local/etc/redis/conf
networks:
- redis_sentinel_network
depends_on:
- redis-sentinel-1
redis-sentinel-3:
container_name: redis-sentinel-3
image: docker.io/redis:7.0.8-alpine3.17
environment:
- TZ=Asia/Shanghai
restart: always
command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
ports:
- 26381:26379
volumes:
- ./sentinel3/conf:/usr/local/etc/redis/conf
networks:
- redis_sentinel_network
depends_on:
- redis-sentinel-2
networks:
redis_sentinel_network:
name: redis_sentinel_network
yml 中的配置重点
command
解析
redis-server --slaveof redis-master 6379 --requirepass root --masterauth root --replica-announce-ip "nas.me" --replica-announce-port 6380 --appendonly yes
--slaveof redis-master 6379
:初始从节点需要指定主节点,同一 docker-network 才可使用容器名指定--requirepass root
:本节点连接密码为root
--masterauth root
:主节点需要的密码为root
--replica-announce-ip "nas.me"
:重点,在使用 docker-network 或者其它任意存在 NAT 网络的情况需要指定可以从外网正常访问的 IP 或者域名,否则仅仅是容器起来了但外界肯定无法正常访问--replica-announce-port 6380
:重点,同上--appendonly yes
:开启 AOF
Sentinel.conf
重点之二。
三个目录下的文件一样即可。
port 26379
dir "/tmp"
sentinel monitor mymaster nas.me 6379 2
sentinel auth-pass mymaster root
sentinel deny-scripts-reconfig yes
sentinel resolve-hostnames yes
conf 中的配置重点
sentinel monitor mymaster nas.me 6379 2
:mymaster
是集群的 Master 组名。nas.me
是外网访问的地址,注意不能写容器名称,否则会映射到容器内网地址,导致外部无法连接上集群。最后一个2
表示当有 2 个哨兵判定主库离线后就进行故障转移sentinel resolve-hostnames yes
:加了这个才能对nas.me
使用域名解析
部署
启动容器
在 docker-compose.yml
同级目录下执行:
docker-compose up -d
不出意外容器便能顺利启动,如下:
$ docker-compose up -d
Creating network "redis_sentinel_network" with the default driver
Creating redis-master ... done
Creating redis-slave-1 ... done
Creating redis-slave-2 ... done
Creating redis-sentinel-1 ... done
Creating redis-sentinel-2 ... done
Creating redis-sentinel-3 ... done
查看容器列表
两种方式,可以在 docker-compose.yml
同级目录下执行:
docker-compose ps
如下:
$ docker-compose ps
Name Command State Ports
----------------------------------------------------------------------------------------------
redis-master docker-entrypoint.sh redis ... Up 0.0.0.0:6379->6379/tcp
redis-sentinel-1 docker-entrypoint.sh redis ... Up 0.0.0.0:26379->26379/tcp, 6379/tcp
redis-sentinel-2 docker-entrypoint.sh redis ... Up 0.0.0.0:26380->26379/tcp, 6379/tcp
redis-sentinel-3 docker-entrypoint.sh redis ... Up 0.0.0.0:26381->26379/tcp, 6379/tcp
redis-slave-1 docker-entrypoint.sh redis ... Up 0.0.0.0:6380->6379/tcp
redis-slave-2 docker-entrypoint.sh redis ... Up 0.0.0.0:6381->6379/tcp
也可以使用 docker 原生的 ps:
docker ps | grep redis
如下:
$ docker ps | grep redis
6ada99fc5fe0 redis:7.0.8-alpine3.17 "docker-entrypoint.s…" 10 hours ago Up 10 hours 6379/tcp, 0.0.0.0:26381->26379/tcp redis-sentinel-3
d4de1c47e3cf redis:7.0.8-alpine3.17 "docker-entrypoint.s…" 10 hours ago Up 10 hours 6379/tcp, 0.0.0.0:26380->26379/tcp redis-sentinel-2
48a45ad344f4 redis:7.0.8-alpine3.17 "docker-entrypoint.s…" 10 hours ago Up 10 hours 6379/tcp, 0.0.0.0:26379->26379/tcp redis-sentinel-1
04e367a07372 redis:7.0.8-alpine3.17 "docker-entrypoint.s…" 10 hours ago Up 10 hours 0.0.0.0:6381->6379/tcp redis-slave-2
b5103b805756 redis:7.0.8-alpine3.17 "docker-entrypoint.s…" 10 hours ago Up 10 hours 0.0.0.0:6380->6379/tcp redis-slave-1
fc0a92fa784e redis:7.0.8-alpine3.17 "docker-entrypoint.s…" 10 hours ago Up 10 hours 0.0.0.0:6379->6379/tcp redis-master
查看容器日志
-f
参数表示实时日志,类似 tail -f
。
docker logs redis-sentinel-1 -f
如下:
$ docker logs redis-sentinel-1 -f
1:X 26 Feb 2023 22:32:28.256 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 26 Feb 2023 22:32:28.256 # Redis version=7.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 26 Feb 2023 22:32:28.256 # Configuration loaded
1:X 26 Feb 2023 22:32:28.257 * monotonic clock: POSIX clock_gettime
1:X 26 Feb 2023 22:32:28.258 * Running mode=sentinel, port=26379.
1:X 26 Feb 2023 22:32:28.258 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 26 Feb 2023 22:32:28.272 * Sentinel new configuration saved on disk
1:X 26 Feb 2023 22:32:28.273 # Sentinel ID is 67601f6dd92f539fa4549ada69d141f9483a5254
1:X 26 Feb 2023 22:32:28.273 # +monitor master mymaster 192.168.80.2 6379 quorum 2
1:X 26 Feb 2023 22:32:28.277 * +slave slave 192.168.50.105:6380 192.168.50.105 6380 @ mymaster 192.168.80.2 6379
1:X 26 Feb 2023 22:32:28.288 * Sentinel new configuration saved on disk
1:X 26 Feb 2023 22:32:28.291 * +slave slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.80.2 6379
1:X 26 Feb 2023 22:32:28.326 * Sentinel new configuration saved on disk
1:X 26 Feb 2023 22:32:31.007 * +sentinel sentinel 877c03d29cc514c5b06ff3137c52a8848e6e6692 192.168.80.6 26379 @ mymaster 192.168.80.2 6379
1:X 26 Feb 2023 22:32:31.020 * Sentinel new configuration saved on disk
1:X 26 Feb 2023 22:32:31.736 * +sentinel sentinel 57acf9abe43e019c403d75ea990d9ec7a462956d 192.168.80.7 26379 @ mymaster 192.168.80.2 6379
1:X 26 Feb 2023 22:32:31.750 * Sentinel new configuration saved on disk
查看容器主从关系
无需进容器里面,命令:
docker exec -it redis-master redis-cli -a root info replication
-a root
是密码。执行后输出如下:
$ docker exec -it redis-master redis-cli -a root info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:2
slave0:ip=nas.me,port=6380,state=online,offset=7616333,lag=1
slave1:ip=nas.me,port=6381,state=online,offset=7616333,lag=0
master_failover_state:no-failover
master_replid:5ad8990b0e92008ef5c38829f48acc148fb10039
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:7616474
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:6561241
repl_backlog_histlen:1055234
容器启停与销毁
停止容器:
docker-compose stop
启动已有容器:
docker-compose start
停止并清除容器:
docker-compose down
其实就是 docker-compose
命令,可以查看相关操作手册或帮助。
注意 docker-compose
命令需要在 yml 同级目录下操作。
验证故障转移(failover)
哨兵模式的 Redis 集群最大特点自然是自动的主从切换了,即主节点挂掉时会进行自动故障转移(failover),手段是选举一个从节点来当新的主节点。
停掉 master 节点:
docker stop redis-master
查看其中一个哨兵的日志:
$ docker logs redis-sentinel-1 -f
1:X 27 Feb 2023 09:26:58.033 # +sdown master mymaster 192.168.50.105 6379
1:X 27 Feb 2023 09:26:58.109 # +odown master mymaster 192.168.50.105 6379 #quorum 3/2
1:X 27 Feb 2023 09:26:58.109 # +new-epoch 1
1:X 27 Feb 2023 09:26:58.109 # +try-failover master mymaster 192.168.50.105 6379
1:X 27 Feb 2023 09:26:58.134 * Sentinel new configuration saved on disk
1:X 27 Feb 2023 09:26:58.134 # +vote-for-leader 93502b238af5424482cd7c890c0e682e9b2d8fdc 1
1:X 27 Feb 2023 09:26:58.135 # 292446f28b077e382fc11841cb54d0ee6991f31b voted for 292446f28b077e382fc11841cb54d0ee6991f31b 1
1:X 27 Feb 2023 09:26:58.152 # 02774af034f6587bdaf4a5baff373b07be5378e8 voted for 292446f28b077e382fc11841cb54d0ee6991f31b 1
1:X 27 Feb 2023 09:26:58.851 # +config-update-from sentinel 292446f28b077e382fc11841cb54d0ee6991f31b 192.168.96.6 26379 @ mymaster 192.168.50.105 6379
1:X 27 Feb 2023 09:26:58.851 # +switch-master mymaster 192.168.50.105 6379 192.168.50.105 6380
1:X 27 Feb 2023 09:26:58.853 * +slave slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:26:58.856 * +slave slave 192.168.50.105:6379 192.168.50.105 6379 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:26:58.874 * Sentinel new configuration saved on disk
1:X 27 Feb 2023 09:27:28.872 # +sdown slave 192.168.50.105:6379 192.168.50.105 6379 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.350 # +sdown master mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.406 # +odown master mymaster 192.168.50.105 6380 #quorum 2/2
1:X 27 Feb 2023 09:27:59.406 # +new-epoch 2
1:X 27 Feb 2023 09:27:59.406 # +try-failover master mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.419 * Sentinel new configuration saved on disk
1:X 27 Feb 2023 09:27:59.419 # +vote-for-leader 93502b238af5424482cd7c890c0e682e9b2d8fdc 2
1:X 27 Feb 2023 09:27:59.441 # 02774af034f6587bdaf4a5baff373b07be5378e8 voted for 93502b238af5424482cd7c890c0e682e9b2d8fdc 2
1:X 27 Feb 2023 09:27:59.482 # +elected-leader master mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.482 # +failover-state-select-slave master mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.565 # +selected-slave slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.565 * +failover-state-send-slaveof-noone slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.624 * +failover-state-wait-promotion slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:28:00.130 * Sentinel new configuration saved on disk
1:X 27 Feb 2023 09:28:00.130 # +promoted-slave slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:28:00.190 # +failover-end master mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:28:00.191 # +switch-master mymaster 192.168.50.105 6380 192.168.50.105 6381
1:X 27 Feb 2023 09:28:00.193 * +slave slave 192.168.50.105:6379 192.168.50.105 6379 @ mymaster 192.168.50.105 6381
1:X 27 Feb 2023 09:28:00.193 * +slave slave 192.168.50.105:6380 192.168.50.105 6380 @ mymaster 192.168.50.105 6381
1:X 27 Feb 2023 09:28:00.205 * Sentinel new configuration saved on disk
1:X 27 Feb 2023 09:28:30.210 # +sdown slave 192.168.50.105:6379 192.168.50.105 6379 @ mymaster 192.168.50.105 6381
可以看到一系列故障转移的过程。
查看其中一个节点:
$ docker exec -it redis-slave-2 redis-cli -a root info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=nas.me,port=6380,state=online,offset=8247997,lag=1
master_failover_state:no-failover
master_replid:754d3d8e8eff1a537f96ff1f4e57d11bf511825f
master_replid2:bff8fc75d1bf8aab0533feda641fb665904216e1
master_repl_offset:8247997
second_repl_offset:8063230
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:7195186
repl_backlog_histlen:1052812
发现从节点 2 已经晋升成为了新的主节点,故障转移成功。
不过这个过程经历十分漫长,达到分钟的级别,在实际生产中需要根据实际情况控制这个时间。
主要的控制参数可配置在 sentinel.conf
中, Redis 官方文档已经有介绍了,此处简单搬运并解释一下:
-
sentinel down-after-milliseconds mymaster 30000
:哨兵将在 ping 主节点 30 秒无响应后认定主节点已挂这个时间过后将进行故障转移操作。设置过大会影响下游服务的即时性,过小则有可能误判节点的存活与否,根据实际情况调整此参数。
-
sentinel parallel-syncs mymaster 1
:同一时间内可以从主节点同步数据的节点数量为 1同步不会阻塞主节点,但并行同步的数量过多会影响主节点的 IO,视情况设置此参数。
-
sentinel failover-timeout mymaster 180000
:故障转移的超时时间为 180 秒设置的这个 180 秒是限定故障转移的全过程时间,包括选举节点,晋升节点,从节点同步新选举出来的主节点,以及旧主节点醒来后去同步新主节点。其中任意部分发送指令后的响应时间超过 failover-timeout 则被认定为故障转移失败(不包括从节点同步数据的过程)。如果对一个主节点故障转移失败, 那么下次再对该主节点做故障转移的起始时间是 failover-timeout 的 2 倍。
详细的情况可以直接查看官方文档:https://redis.io/docs/management/sentinel/#configuring-sentinel
应用
第三方可视化客户端
两个都能在 M1 架构的 MacOS Ventura 上使用。
SpringBoot 连接使用
pom.xml
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
application.properties
spring.data.redis.sentinel.master=mymaster
spring.data.redis.sentinel.nodes=nas.me:26379,nas.me:26380,nas.me:26381
spring.data.redis.password=root
测试方法部分截取:
@Test
void testRedis() {
ValueOperations<String, String> ops = redisTemplate.opsForValue();
ops.set("testKey", "testValue", Duration.of(5, ChronoUnit.SECONDS));
String value = ops.get("testKey");
Assertions.assertEquals("testValue", value);
}
测试成功通过,则 Redis 哨兵模式集群才算完成搭建。