前言

早前写过一篇《简单明了搭建Redis哨兵模式》,最近复盘的时候发现方案可以改进,而且有不少需要注意的地方没有记录清楚,留了不少坑,不适合生产环境速查,所以 Remake 一篇。

本次将与时俱进使用最新的 Redis 7。

  • Redis 镜像版本:redis:7.0.8-alpine3.17
  • docker 版本:Docker version 20.10.3, build b455053
  • docker-compose 版本:docker-compose version 1.28.5, build 324b023a

同样为了贯彻简单明了的原则,docker/docker-compose 的安装与镜像下载均不赘述。

计划

依旧是经典的 1 主 2 从 3 哨兵,

只使用一个 docker-compose.yml,

将集群部署在本人家里的 NAS 上,NAS 的访问地址为 nas.me

容器 容器端口 宿主机网络
redis-master 6379 nas.me:6379
redis-slave-1 6379 nas.me:6380
redis-slave-2 6379 nas.me:6381
redis-sentinel-1 26379 nas.me:26379
redis-sentinel-2 26379 nas.me:26380
redis-sentinel-3 26379 nas.me:26381

实施

文件目录

redis-sentinel
├── docker-compose.yml
├── master
│   └── data
├── sentinel1
│   └── conf
│       └── sentinel.conf
├── sentinel2
│   └── conf
│       └── sentinel.conf
├── sentinel3
│   └── conf
│       └── sentinel.conf
├── slave1
│   └── data
└── slave2
    └── data

docker-compose.yml

核心文件。

此处的 Redis 实例并没有用映射 redis.conf 的方式启动,而是将主要参数直接添加到 command 中,这么做与将 --参数 写到配置文件里是等效的 。

version: '3'
services:
  redis-master:
    container_name: redis-master
    image: docker.io/redis:7.0.8-alpine3.17
    environment: 
      - TZ=Asia/Shanghai
    restart: always
    command: redis-server --requirepass root --masterauth root --replica-announce-ip "nas.me" --replica-announce-port 6379 --appendonly yes
    ports:
      - 6379:6379
    volumes:
      - ./master/data:/data
    networks:
      - redis_sentinel_network

  redis-slave-1:
    container_name: redis-slave-1
    image: docker.io/redis:7.0.8-alpine3.17
    environment: 
      - TZ=Asia/Shanghai
    restart: always
    command: redis-server --slaveof redis-master 6379 --requirepass root --masterauth root --replica-announce-ip "nas.me" --replica-announce-port 6380 --appendonly yes
    ports:
      - 6380:6379
    volumes:
      - ./master/data:/data
    networks:
      - redis_sentinel_network
    depends_on:
      - redis-master

  redis-slave-2:
    container_name: redis-slave-2
    image: docker.io/redis:7.0.8-alpine3.17
    environment: 
      - TZ=Asia/Shanghai
    restart: always
    command: redis-server --slaveof redis-master 6379 --requirepass root --masterauth root --replica-announce-ip "nas.me" --replica-announce-port 6381 --appendonly yes
    ports:
      - 6381:6379
    volumes:
      - ./master/data:/data
    networks:
      - redis_sentinel_network
    depends_on:
      - redis-slave-1

  redis-sentinel-1:
    container_name: redis-sentinel-1
    image: docker.io/redis:7.0.8-alpine3.17
    environment: 
      - TZ=Asia/Shanghai
    restart: always
    command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
    ports:
      - 26379:26379
    volumes:
      - ./sentinel1/conf:/usr/local/etc/redis/conf
    networks:
      - redis_sentinel_network
    depends_on:
      - redis-slave-2

  redis-sentinel-2:
    container_name: redis-sentinel-2
    image: docker.io/redis:7.0.8-alpine3.17
    environment: 
      - TZ=Asia/Shanghai
    restart: always
    command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
    ports:
      - 26380:26379
    volumes:
      - ./sentinel2/conf:/usr/local/etc/redis/conf
    networks:
      - redis_sentinel_network
    depends_on:
      - redis-sentinel-1

  redis-sentinel-3:
    container_name: redis-sentinel-3
    image: docker.io/redis:7.0.8-alpine3.17
    environment: 
      - TZ=Asia/Shanghai
    restart: always
    command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
    ports:
      - 26381:26379
    volumes:
      - ./sentinel3/conf:/usr/local/etc/redis/conf
    networks:
      - redis_sentinel_network
    depends_on:
      - redis-sentinel-2

networks:
  redis_sentinel_network:
    name: redis_sentinel_network

yml 中的配置重点

command 解析

redis-server --slaveof redis-master 6379 --requirepass root --masterauth root --replica-announce-ip "nas.me" --replica-announce-port 6380 --appendonly yes
  • --slaveof redis-master 6379 :初始从节点需要指定主节点,同一 docker-network 才可使用容器名指定
  • --requirepass root :本节点连接密码为 root
  • --masterauth root :主节点需要的密码为 root
  • --replica-announce-ip "nas.me"重点,在使用 docker-network 或者其它任意存在 NAT 网络的情况需要指定可以从外网正常访问的 IP 或者域名,否则仅仅是容器起来了但外界肯定无法正常访问
  • --replica-announce-port 6380重点,同上
  • --appendonly yes :开启 AOF

Sentinel.conf

重点之二。

三个目录下的文件一样即可。

port 26379
dir "/tmp"
sentinel monitor mymaster nas.me 6379 2
sentinel auth-pass mymaster root
sentinel deny-scripts-reconfig yes
sentinel resolve-hostnames yes

conf 中的配置重点

  • sentinel monitor mymaster nas.me 6379 2mymaster 是集群的 Master 组名。 nas.me 是外网访问的地址,注意不能写容器名称,否则会映射到容器内网地址,导致外部无法连接上集群。最后一个 2 表示当有 2 个哨兵判定主库离线后就进行故障转移
  • sentinel resolve-hostnames yes :加了这个才能对 nas.me 使用域名解析

部署

启动容器

docker-compose.yml 同级目录下执行:

docker-compose up -d

不出意外容器便能顺利启动,如下:

$ docker-compose up -d
Creating network "redis_sentinel_network" with the default driver
Creating redis-master ... done
Creating redis-slave-1 ... done
Creating redis-slave-2 ... done
Creating redis-sentinel-1 ... done
Creating redis-sentinel-2 ... done
Creating redis-sentinel-3 ... done

查看容器列表

两种方式,可以在 docker-compose.yml 同级目录下执行:

docker-compose ps

如下:

$ docker-compose ps
      Name                    Command               State                 Ports
----------------------------------------------------------------------------------------------
redis-master       docker-entrypoint.sh redis ...   Up      0.0.0.0:6379->6379/tcp
redis-sentinel-1   docker-entrypoint.sh redis ...   Up      0.0.0.0:26379->26379/tcp, 6379/tcp
redis-sentinel-2   docker-entrypoint.sh redis ...   Up      0.0.0.0:26380->26379/tcp, 6379/tcp
redis-sentinel-3   docker-entrypoint.sh redis ...   Up      0.0.0.0:26381->26379/tcp, 6379/tcp
redis-slave-1      docker-entrypoint.sh redis ...   Up      0.0.0.0:6380->6379/tcp
redis-slave-2      docker-entrypoint.sh redis ...   Up      0.0.0.0:6381->6379/tcp

也可以使用 docker 原生的 ps:

docker ps | grep redis

如下:

$ docker ps | grep redis
6ada99fc5fe0   redis:7.0.8-alpine3.17                 "docker-entrypoint.s…"   10 hours ago   Up 10 hours   6379/tcp, 0.0.0.0:26381->26379/tcp                           redis-sentinel-3
d4de1c47e3cf   redis:7.0.8-alpine3.17                 "docker-entrypoint.s…"   10 hours ago   Up 10 hours   6379/tcp, 0.0.0.0:26380->26379/tcp                           redis-sentinel-2
48a45ad344f4   redis:7.0.8-alpine3.17                 "docker-entrypoint.s…"   10 hours ago   Up 10 hours   6379/tcp, 0.0.0.0:26379->26379/tcp                           redis-sentinel-1
04e367a07372   redis:7.0.8-alpine3.17                 "docker-entrypoint.s…"   10 hours ago   Up 10 hours   0.0.0.0:6381->6379/tcp                                       redis-slave-2
b5103b805756   redis:7.0.8-alpine3.17                 "docker-entrypoint.s…"   10 hours ago   Up 10 hours   0.0.0.0:6380->6379/tcp                                       redis-slave-1
fc0a92fa784e   redis:7.0.8-alpine3.17                 "docker-entrypoint.s…"   10 hours ago   Up 10 hours   0.0.0.0:6379->6379/tcp                                       redis-master

查看容器日志

-f 参数表示实时日志,类似 tail -f

docker logs redis-sentinel-1 -f

如下:

$ docker logs redis-sentinel-1 -f
1:X 26 Feb 2023 22:32:28.256 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 26 Feb 2023 22:32:28.256 # Redis version=7.0.8, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 26 Feb 2023 22:32:28.256 # Configuration loaded
1:X 26 Feb 2023 22:32:28.257 * monotonic clock: POSIX clock_gettime
1:X 26 Feb 2023 22:32:28.258 * Running mode=sentinel, port=26379.
1:X 26 Feb 2023 22:32:28.258 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 26 Feb 2023 22:32:28.272 * Sentinel new configuration saved on disk
1:X 26 Feb 2023 22:32:28.273 # Sentinel ID is 67601f6dd92f539fa4549ada69d141f9483a5254
1:X 26 Feb 2023 22:32:28.273 # +monitor master mymaster 192.168.80.2 6379 quorum 2
1:X 26 Feb 2023 22:32:28.277 * +slave slave 192.168.50.105:6380 192.168.50.105 6380 @ mymaster 192.168.80.2 6379
1:X 26 Feb 2023 22:32:28.288 * Sentinel new configuration saved on disk
1:X 26 Feb 2023 22:32:28.291 * +slave slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.80.2 6379
1:X 26 Feb 2023 22:32:28.326 * Sentinel new configuration saved on disk
1:X 26 Feb 2023 22:32:31.007 * +sentinel sentinel 877c03d29cc514c5b06ff3137c52a8848e6e6692 192.168.80.6 26379 @ mymaster 192.168.80.2 6379
1:X 26 Feb 2023 22:32:31.020 * Sentinel new configuration saved on disk
1:X 26 Feb 2023 22:32:31.736 * +sentinel sentinel 57acf9abe43e019c403d75ea990d9ec7a462956d 192.168.80.7 26379 @ mymaster 192.168.80.2 6379
1:X 26 Feb 2023 22:32:31.750 * Sentinel new configuration saved on disk

查看容器主从关系

无需进容器里面,命令:

docker exec -it redis-master redis-cli -a root info replication

-a root 是密码。执行后输出如下:

$ docker exec -it redis-master redis-cli -a root info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:2
slave0:ip=nas.me,port=6380,state=online,offset=7616333,lag=1
slave1:ip=nas.me,port=6381,state=online,offset=7616333,lag=0
master_failover_state:no-failover
master_replid:5ad8990b0e92008ef5c38829f48acc148fb10039
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:7616474
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:6561241
repl_backlog_histlen:1055234

容器启停与销毁

停止容器:

docker-compose stop

启动已有容器:

docker-compose start

停止并清除容器:

docker-compose down

其实就是 docker-compose 命令,可以查看相关操作手册或帮助。

注意 docker-compose 命令需要在 yml 同级目录下操作。

验证故障转移(failover)

哨兵模式的 Redis 集群最大特点自然是自动的主从切换了,即主节点挂掉时会进行自动故障转移(failover),手段是选举一个从节点来当新的主节点。

停掉 master 节点:

docker stop redis-master

查看其中一个哨兵的日志:

$ docker logs redis-sentinel-1 -f
1:X 27 Feb 2023 09:26:58.033 # +sdown master mymaster 192.168.50.105 6379
1:X 27 Feb 2023 09:26:58.109 # +odown master mymaster 192.168.50.105 6379 #quorum 3/2
1:X 27 Feb 2023 09:26:58.109 # +new-epoch 1
1:X 27 Feb 2023 09:26:58.109 # +try-failover master mymaster 192.168.50.105 6379
1:X 27 Feb 2023 09:26:58.134 * Sentinel new configuration saved on disk
1:X 27 Feb 2023 09:26:58.134 # +vote-for-leader 93502b238af5424482cd7c890c0e682e9b2d8fdc 1
1:X 27 Feb 2023 09:26:58.135 # 292446f28b077e382fc11841cb54d0ee6991f31b voted for 292446f28b077e382fc11841cb54d0ee6991f31b 1
1:X 27 Feb 2023 09:26:58.152 # 02774af034f6587bdaf4a5baff373b07be5378e8 voted for 292446f28b077e382fc11841cb54d0ee6991f31b 1
1:X 27 Feb 2023 09:26:58.851 # +config-update-from sentinel 292446f28b077e382fc11841cb54d0ee6991f31b 192.168.96.6 26379 @ mymaster 192.168.50.105 6379
1:X 27 Feb 2023 09:26:58.851 # +switch-master mymaster 192.168.50.105 6379 192.168.50.105 6380
1:X 27 Feb 2023 09:26:58.853 * +slave slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:26:58.856 * +slave slave 192.168.50.105:6379 192.168.50.105 6379 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:26:58.874 * Sentinel new configuration saved on disk
1:X 27 Feb 2023 09:27:28.872 # +sdown slave 192.168.50.105:6379 192.168.50.105 6379 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.350 # +sdown master mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.406 # +odown master mymaster 192.168.50.105 6380 #quorum 2/2
1:X 27 Feb 2023 09:27:59.406 # +new-epoch 2
1:X 27 Feb 2023 09:27:59.406 # +try-failover master mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.419 * Sentinel new configuration saved on disk
1:X 27 Feb 2023 09:27:59.419 # +vote-for-leader 93502b238af5424482cd7c890c0e682e9b2d8fdc 2
1:X 27 Feb 2023 09:27:59.441 # 02774af034f6587bdaf4a5baff373b07be5378e8 voted for 93502b238af5424482cd7c890c0e682e9b2d8fdc 2
1:X 27 Feb 2023 09:27:59.482 # +elected-leader master mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.482 # +failover-state-select-slave master mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.565 # +selected-slave slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.565 * +failover-state-send-slaveof-noone slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:27:59.624 * +failover-state-wait-promotion slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:28:00.130 * Sentinel new configuration saved on disk
1:X 27 Feb 2023 09:28:00.130 # +promoted-slave slave 192.168.50.105:6381 192.168.50.105 6381 @ mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:28:00.190 # +failover-end master mymaster 192.168.50.105 6380
1:X 27 Feb 2023 09:28:00.191 # +switch-master mymaster 192.168.50.105 6380 192.168.50.105 6381
1:X 27 Feb 2023 09:28:00.193 * +slave slave 192.168.50.105:6379 192.168.50.105 6379 @ mymaster 192.168.50.105 6381
1:X 27 Feb 2023 09:28:00.193 * +slave slave 192.168.50.105:6380 192.168.50.105 6380 @ mymaster 192.168.50.105 6381
1:X 27 Feb 2023 09:28:00.205 * Sentinel new configuration saved on disk
1:X 27 Feb 2023 09:28:30.210 # +sdown slave 192.168.50.105:6379 192.168.50.105 6379 @ mymaster 192.168.50.105 6381

可以看到一系列故障转移的过程。

查看其中一个节点:

$ docker exec -it redis-slave-2 redis-cli -a root info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=nas.me,port=6380,state=online,offset=8247997,lag=1
master_failover_state:no-failover
master_replid:754d3d8e8eff1a537f96ff1f4e57d11bf511825f
master_replid2:bff8fc75d1bf8aab0533feda641fb665904216e1
master_repl_offset:8247997
second_repl_offset:8063230
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:7195186
repl_backlog_histlen:1052812

发现从节点 2 已经晋升成为了新的主节点,故障转移成功。

不过这个过程经历十分漫长,达到分钟的级别,在实际生产中需要根据实际情况控制这个时间。

主要的控制参数可配置在 sentinel.conf 中, Redis 官方文档已经有介绍了,此处简单搬运并解释一下:

  • sentinel down-after-milliseconds mymaster 30000 :哨兵将在 ping 主节点 30 秒无响应后认定主节点已挂

    这个时间过后将进行故障转移操作。设置过大会影响下游服务的即时性,过小则有可能误判节点的存活与否,根据实际情况调整此参数。

  • sentinel parallel-syncs mymaster 1 :同一时间内可以从主节点同步数据的节点数量为 1

    同步不会阻塞主节点,但并行同步的数量过多会影响主节点的 IO,视情况设置此参数。

  • sentinel failover-timeout mymaster 180000 :故障转移的超时时间为 180 秒

    设置的这个 180 秒是限定故障转移的全过程时间,包括选举节点,晋升节点,从节点同步新选举出来的主节点,以及旧主节点醒来后去同步新主节点。其中任意部分发送指令后的响应时间超过 failover-timeout 则被认定为故障转移失败(不包括从节点同步数据的过程)。如果对一个主节点故障转移失败, 那么下次再对该主节点做故障转移的起始时间是 failover-timeout 的 2 倍。

详细的情况可以直接查看官方文档:https://redis.io/docs/management/sentinel/#configuring-sentinel

应用

第三方可视化客户端

两个都能在 M1 架构的 MacOS Ventura 上使用。

SpringBoot 连接使用

pom.xml

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis</artifactId>
        </dependency>

application.properties

spring.data.redis.sentinel.master=mymaster
spring.data.redis.sentinel.nodes=nas.me:26379,nas.me:26380,nas.me:26381
spring.data.redis.password=root

测试方法部分截取:

    @Test
    void testRedis() {
        ValueOperations<String, String> ops = redisTemplate.opsForValue();
        ops.set("testKey", "testValue", Duration.of(5, ChronoUnit.SECONDS));
        String value = ops.get("testKey");
        Assertions.assertEquals("testValue", value);
    }

测试成功通过,则 Redis 哨兵模式集群才算完成搭建。