一、变更计划与方案
1.1 概述 现有postgresql数据库为单一数据库,没有备份数据,为保证数据安全正常运行,现将给定的3台服务器,组成一个1主2从的高可用架构。 1.2 基本信息
主机名 | IP地址 | 服务器角色 | 系统信息 | 数据库目录 |
---|
72_123 | 192.168.247.123 | Postgresql主节点 | Red Hat7.7 | /data | 72_124 | 192.168.247.124 | Postgresql从节点 | Red Hat7.7 | /data | 72_125 | 192.168.247.125 | Postgresql从节点 | Red Hat7.7 | /data |
1.3实施计划 先将247.124和247.125设置为247.123的从库,然后安装Keepalived 高可用组件,搭建高可用架构
二、实施方案
2.1 各节点安装postgres (root用户) 准备postgresql-11-10.tar.gz软件包 https://www.postgresql.org/ftp/source/v11.10/ 安装相关依赖包
yum -y install readline readline-devel zlib zlib-devel
配置hosts文件
cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.247.123 master
192.168.247.124 slave1
192.168.247.125 slave2
关闭防火墙,SELINUX
systemctl stop firewalld
systemctl enable firewalld
setenforce 0
sed -i 's/SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
创建用户、安装目录、数据库目录
useradd postgres
password postgres
mkdir /opt/pg11/
mkdir /data/
chown -R postgres:postgres /opt/pg11/
chown -R postgres:postgres /data/
解压并编译安装
tar -xvf postgresql-11.10.tar.gz
cd postgresql-11.10/
./configure --prefix=/opt/pg11/
make world
make install-word
设置环境变量
su - postgres
vim .bashrc
export LD_LIBRARY_PATH=/opt/pg11/lib
export PGDATA=/data/
export PGHOME=/opt/pg11/
export PATH=$PATH:/opt/pg11/bin/
初始化数据库
/opt/pg11/bin/initdb -D /data/
修改配置文件(参数值根据服务器配置填写)
listen_addresses = '*'
port = 5432
max_connections = 100
shared_buffers = 128MB
dynamic_shared_memory_type = posix
wal_level = replica
max_wal_size = 1GB
min_wal_size = 80MB
track_commit_timestamp = on
log_timezone = 'PRC'
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
启动数据库
pg_ctl -D /data/ -l logfile start
如果是新建数据库可以将以下主从同步一同配置后再启动数据库
2.2 主库配置(postgres用户配置) 创建replicator 用户进行主从同步
create role replicator with login replication password 'replicator';
修改pg_hba.conf,允许replicator 用户来同步
host all all 192.168.247.0/24 md5
host replication replicator 192.168.247.0/24 md5
修改postgresql.conf,修改或添加以下内容
wal_level = replica #热备模式
max_wal_senders= 10 #设置最达流复制链接,控制主库最多可以有多少个并发standby数据库)
wal_keep_segments = 512 #重要配置(配置保存wal日志大小)
wal_sender_timeout = 60s
重启数据库生效
pg_ctl restart
2.3 从库配置(postgres用户配置) 将主库数据拷贝至从库(247.124,247.125)
pg_basebackup -h 192.168.247.123 -D /home/postgres/data/ -F p -X stream -v -P -U replicator
输入密码:replicator
修改或添加postgresql.conf以下内容
max_connections = 1100 #一般从库的最大连接数要大于主库
hot_standby = on #说明这台机器不仅仅是用于数据归档,也用于数据查询
max_standby_streaming_delay = 30s #数据流备份的最大延迟时间
wal_receiver_status_interval = 10s #多久向主报告一次从的状态,当然从每次数据复制都会向主报告状态,这里只是设置最长的间隔时间
配置recovery.conf
standby_mode = on
primary_conninfo = ‘host=192.168.247.123 port=5432 user=replicator password=replicator’
recovery_target_timeline = ‘latest’
启动数据库(247.124,247.125)
pg_ctl start
2.4 安装keepalived(root用户配置) 主节点247.123,从节点247.124,247.125
yum -y install keepalived
touch /etc/keepalived/pg_check.sh
chmod +x /etc/keepalived/pg_check.sh
pg_check.sh脚本
#!/bin/bash
#PG is alive or not?
A=`ps -ef | grep postgres | grep "/opt/pg11/bin/postgres*" |grep -v grep| wc -l`
# where the VIP floats to
B=`ip a | grep 192.168.247.121 | wc -l`
# the slave is in a waiting state?
C=`ps -ef | grep postgres | grep 'startup process' |grep -v grep| wc -l`
#the secondary is positive for linking to the master ?
D=`ps -ef | grep postgres | grep 'receiver' |grep -v grep | wc -l`
#he connection between the master and the slave library is normal?
E=`ps -ef | grep postgres | grep 'sender' |grep -v grep| wc -l`
#If pg dies, log the message and turn off keepalived
if [ $A -eq 0 ];then
echo "`date "+%Y-%m-%d--%H:%M:%S"` postgresql stop so vip stop " >> /etc/keepalived/log/check_pg.log
systemctl stop keepalived
else
#When the master is dead, the VIP floats to slave, elevating slave's status to make him read and write
if [ $B -eq 1 -a $C -eq 1 -a $D -eq 0 ];then
su - postgres -c "/opt/pg11/bin/pg_ctl promote -D /data/"
echo "`date "+%Y-%m-%d--%H:%M:%S"` standby promote " >> /etc/keepalived/log/check_pg.log
fi
#Judge yourself as Lord and lose contact with him
if [ $B -eq 1 -a $C -eq 0 -a $D -eq 0 -a $E -eq 0 ];then
sleep 10
echo "`date "+%Y-%m-%d--%H:%M:%S"` can't find standby " >> /etc/keepalived/log/check_pg.log
fi
fi
配置Keepalived日志
vim /etc/sysconfig/keepalived
# Options for keepalived. See `keepalived --help' output and keepalived(8) and
# keepalived.conf(5) man pages for a list of all options. Here are the most
# common ones :
#
# --vrrp -P Only run with VRRP subsystem.
# --check -C Only run with Health-checker subsystem.
# --dont-release-vrrp -V Dont remove VRRP VIPs & VROUTEs on daemon stop.
# --dont-release-ipvs -I Dont remove IPVS topology on daemon stop.
# --dump-conf -d Dump the configuration data.
# --log-detail -D Detailed log messages.
# --log-facility -S 0-7 Set local syslog facility (default=LOG_DAEMON)
KEEPALIVED_OPTIONS="-D -d -S 0"
修改/etc/rsyslog.conf文件,在末尾添加如下配置
local0.* /var/log/keepalived.log
重启日志记录服务
systemctl restart rsyslog
查看Keepalived日志
tail -f /var/log/keepalived.log
修改Keepalived配置文件 (主节点247.123)
mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
**@qq.com
}
notification_email_from **@163.com
smtp_server smtp.163.com
smtp_connect_timeout 30
router_id postgres_master
}
vrrp_script check_pg_alived {
script "/etc/keepalived/pg_check.sh"
interval 10
weight 2
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 66
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1234
}
track_script {
check_pg_alived
}
virtual_ipaddress {
192.168.247.121
}
}
(从节点247.124,247.125)
! Configuration File for keepalived
global_defs {
notification_email {
**@qq.com
}
notification_email_from **@163.com
smtp_server smtp.163.com
smtp_connect_timeout 30
router_id postgres_master
}
vrrp_script check_pg_alived {
script "/etc/keepalived/pg_check.sh"
interval 10
fall 5
}
vrrp_instance VI_1 {
state BACKUP
nopreempt
interface ens33
virtual_router_id 66
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass 1234
}
track_script {
check_pg_alived
}
virtual_ipaddress {
192.168.247.121
}
}
所有节点启动keepalived服务,并查看状态(root用户)
systemctl restart keepalived
systemctl enable keepalived
systemctl status keepalived
3 高可用验证(测试环境)
3.1各节点postgres进程正常,keepalived状态正常,查看VIP节点
[root@72-123 keepalived]# ip a |grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.247.123/24 brd 192.168.247.255 scope global noprefixroute ens33
inet 192.168.247.121/32 scope global ens33
3.2 主节点宕机模拟
[postgres@72-123 ~]$ pg_ctl stop
waiting for server to shut down....2021-10-23 01:29:37.950 PDT [59281] LOG: received fast shutdown request
2021-10-23 01:29:37.952 PDT [59281] LOG: aborting any active transactions
2021-10-23 01:29:37.953 PDT [59281] LOG: background worker "logical replication launcher" (PID 59288) exited with exit code 1
2021-10-23 01:29:37.953 PDT [59283] LOG: shutting down
2021-10-23 01:29:37.972 PDT [59281] LOG: database system is shut down
done
server stopped
3.3 VIP迁移到从节点
[postgres@72-125 ~]$2021-10-23 01:29:42.846 PDT [62019] FATAL: could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host "192.168.247.121" and accepting
TCP/IP connections on port 5432?
2021-10-23 01:29:47.855 PDT [62052] LOG: connection received: host=192.168.247.121 port=53342
2021-10-23 01:29:47.856 PDT [62053] LOG: connection received: host=192.168.247.124 port=59440
2021-10-23 01:29:47.856 PDT [62052] LOG: replication connection authorized: user=replicator
2021-10-23 01:29:47.857 PDT [62053] LOG: replication connection authorized: user=replicator
2021-10-23 01:29:47.857 PDT [62051] LOG: started streaming WAL from primary at 0/A000000 on timeline 1
查看VIP
[postgres@72-125 ~]$ ip a |grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.247.125/24 brd 192.168.247.255 scope global noprefixroute ens33
inet 192.168.247.121/32 scope global ens33
3.4连接测试
[postgres@72-124 ~]$ psql -h 192.168.247.121 -U postres -p
psql (11.10)
Type "help" for help.
postgres=#
3.5恢复节点,启动数据库,并启动keepalived服务
[postgres@72-123 ~]$ pg_ctl start
waiting for server to start....2021-10-23 01:37:13.120 PDT [60997] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-10-23 01:37:13.120 PDT [60997] LOG: listening on IPv6 address "::", port 5432
2021-10-23 01:37:13.122 PDT [60997] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-10-23 01:37:13.132 PDT [60998] LOG: database system was shut down at 2021-10-23 01:29:37 PDT
2021-10-23 01:37:13.134 PDT [60997] LOG: database system is ready to accept connections
done
server started
[root@72-123 keepalived]# systemctl start keepalived
[root@72-123 keepalived]# ip a |grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.247.123/24 brd 192.168.247.255 scope global noprefixroute ens33
inet 192.168.247.121/32 scope global ens33
|