操作系统:CentOS7
Postgresql版本:10.10
Pgpool-II版本:4.0.6
端口
9999:对外提供服务
9000:watch_dog
9898:pcp
9694:watch_dog心跳检查
备注:下文的安装步骤中,【】包围的部分需根据自己的环境来配置
背景
项目上有实现数据库高可用的需求,Postgresql本身支持数据同步和手动failover。即,一个双机热备的Postgresql集群,主节点宕机时,整个集群只能对外提供读服务,而无法提供写服务,只有手动执行promote命令晋升备节点为主节点,才能恢复数据库服务的正常。以下是Postgresql官方文档的引文:
PostgreSQL does not provide the system software required to identify a
failure on the primary and notify the standby database server.
Pgpool通过其WatchDog的虚拟IP和健康检查功能实现了自动故障切换。本文使用master-slave模式搭建集群,不使用replication_mode的原因是该模式下的online recovery存在限制,Pgpool官网引文如下:
There is a restriction in the online recovery in replication_mode. If
Pgpool-II itself is installed on multiple hosts, online recovery does
not work correctly, because Pgpool-II has to stop all the clients
during the 2nd stage of online recovery. If there are several
Pgpool-II hosts, only one of them will have received the online
recovery command and will block the connections from clients.
为实现分区容错,本文使用3个节点搭建集群,每个节点安装Pgpool-II和Postgresql数据库,数据库为1主2备。
注意:Pgpool-II和Postgresql集群对于主节点使用的术语不同,Pgpool-II的主节点称为Master,Postgresql的主节点称为Primary;备节点使用的术语相同,都为standby。
步骤
1 安装并配置PostgreSQL
在所有节点安装PostgreSQL,但仅需要在节点1配置并启动PostgreSQL,然后使用pcp_recovery_node功能初始化并启动节点2、3上的Postgresql服务,注意不要让PostgreSQL开机自启。参考PG数据库安装。
2 配置/etc/hosts
增加DNS配置,供recovery_1st_stage使用
【节点1的IP】 【节点1的主机名】
【节点2的IP】 【节点2的主机名】
【节点3的IP】 【节点3的主机名】
3 配置ssh免密通信
Pgpool的failover配置中使用的failover.sh、follow_master.sh,以及online recovery配置中使用的recovery_1st_stage、pgpool_remote_start脚本中,都涉及通过ssh进行远程操作。
需要配置root用户到postgres用户(包括本机的postgres用户)的单向,以及postgres用户之间的双向免密通信。
3.1 设置postgres用户的密码
su
passwd postgres
3.2 在当前节点执行以下语句,生成密钥
注意不要设置passphrase
su 【当前用户】
ssh-keygen -t rsa
3.3 将公钥发送到其他节点
ssh-copy-id -i ~/.ssh/id_rsa.pub 【用户名】@【IP】
3.4 测试ssh
ssh 【用户名】@【IP】
4 修改ssh配置(可选,默认10个一般够)
vi /etc/ssh/sshd_config
修改
MaxSessions 100
重启sshd
systemctl restart sshd
5 配置系统日志
创建日志文件,应创建在/var/log路径下,否则syslog无法写入文件(除非关闭或配置SELinux)。
mkdir /var/log/pgpool
touch /var/log/pgpool/pgpool_run.log
修改syslog配置
vi /etc/rsyslog.conf
修改
*.info;mail.none;authpriv.none;cron.none;LOCAL1.none /var/log/messages
LOCAL1.* /var/log/pgpool/pgpool_run.log
重启
systemctl restart rsyslog
6 配置pgpass
为了允许数据库的postgres用户无需为流复制和在线恢复指定密码,对于每个节点,在操作系统的postgres用户的主目录中创建.pgpass文件,权限改为600。
su postgres
vi ~/.pgpass
编辑
# hostname:port:database:username:password
# In a standby server, a database field of replication matches streaming replication connections made to the master server.
【节点1的IP】:5432:postgres:postgres:123456
【节点2的IP】:5432:postgres:postgres:123456
【节点3的IP】:5432:postgres:postgres:123456
授权
chmod 600 ~/.pgpass
7 安装Pgpool
root用户执行,在所有节点安装和配置Pgpool;为了使用在线恢复功能,需要pgpool_recovery、pgpool_remote_start、pgpool_switch_xlog等函数,因此需要安装pgpool_recovery。
注意:安装pgpool_recovery需要先安装postgresql-devel。
mkdir /usr/local/pgpool
cd /usr/local/pgpool
tar -zxvf pgpool-II-4.0.6.tar.gz
cd /usr/local/pgpool/pgpool-II-4.0.6
./configure --with-pgsql=/usr/local/pgsql
make && make install
cd /usr/local/pgpool/pgpool-II-4.0.6/src/sql/pgpool-recovery
chown -R postgres /usr/local/pgsql
make && make install
cp /usr/local/etc/pgpool.conf.sample-stream /usr/local/etc/pgpool.conf
8 在节点1的Postgresql上安装pgpool_recovery扩展
使用PG创建数据库时,实际上是通过复制一个现有的数据库来工作的。默认情况下,它复制名为template1的标准系统数据库。
psql template1 -c "CREATE EXTENSION pgpool_recovery"
psql postgres -c "CREATE EXTENSION pgpool_recovery"
9 编写failover脚本
在所有节点上新建配置文件,并赋予执行权限
vi /usr/local/pgpool/failover_stream.sh
编辑脚本(注意和pgpool.conf中failover_command参数的联系),如果primary宕机,晋升standby;如果standby宕机,仅记录日志。
#!/bin/bash
# This script is run by failover_command.
set -o xtrace
logger -i -p local1.info failover_command begin
# Special values:
# %d = node id
# %h = host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %r = new master port number
# %R = new master database cluster path
# %% = '%' character
FAILED_NODE_ID="$1"
FAILED_NODE_HOST="$2"
FAILED_NODE_PORT="$3"
FAILED_NODE_PGDATA="$4"
NEW_MASTER_NODE_ID="$5"
NEW_MASTER_NODE_HOST="$6"
OLD_MASTER_NODE_ID="$7"
OLD_PRIMARY_NODE_ID="$8"
NEW_MASTER_NODE_PORT="$9"
NEW_MASTER_NODE_PGDATA="${10}"
PGHOME=/usr/local/pgsql
logger -i -p local1.info failover.sh: start: failed_node_id=${FAILED_NODE_ID} old_primary_node_id=${OLD_PRIMARY_NODE_ID} \
failed_host=${FAILED_NODE_HOST} new_master_host=${NEW_MASTER_NODE_HOST}
## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected]${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa ls /tmp > /dev/null
if [ $? -ne 0 ]; then
logger -i -p local1.error failover.sh: passwrodless SSH to [email protected]${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.
exit 1
fi
# If standby node is down, skip failover.
if [ ${FAILED_NODE_ID} -ne ${OLD_PRIMARY_NODE_ID} ]; then
logger -i -p local1.info failover.sh: Standby node is down. Skipping failover.
exit 0
fi
# Promote standby node.
logger -i -p local1.info failover.sh: Primary node is down, promote standby node [email protected]${NEW_MASTER_NODE_HOST}.
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
[email protected]${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa ${PGHOME}/bin/pg_ctl -D ${NEW_MASTER_NODE_PGDATA} -w promote
if [ $? -ne 0 ]; then
logger -i -p local1.error failover.sh: new_master_host=${NEW_MASTER_NODE_HOST} promote failed
exit 1
fi
logger -i -p local1.info failover.sh: end: new_master_node_id=$NEW_MASTER_NODE_ID started as the primary node
exit 0
授权
chmod +x /usr/local/pgpool/failover_stream.sh
10 编写follow_master脚本
使仍在运行的standby从新的primary上同步数据,三个及以上节点数量时,需要设置此脚本
vi /usr/local/pgpool/follow_master.sh
编辑脚本
#!/bin/bash
# This script is run after failover_command to synchronize the Standby with the new Primary.
set -o xtrace
logger -i -p local1.info follow_master begin
# special values: %d = node id
# %h = host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %M = old master node id
# %H = new master node host name
# %P = old primary node id
# %R = new master database cluster path
# %r = new master port number
# %% = '%' character
FAILED_NODE_ID="$1"
FAILED_NODE_HOST="$2"
FAILED_NODE_PORT="$3"
FAILED_NODE_PGDATA="$4"
NEW_MASTER_NODE_ID="$5"
NEW_MASTER_NODE_HOST="$6"
OLD_MASTER_NODE_ID="$7"
OLD_PRIMARY_NODE_ID="$8"
NEW_MASTER_NODE_PORT="$9"
NEW_MASTER_NODE_PGDATA="${10}"
PGHOME=/usr/local/pgsql
ARCHIVEDIR=/home/pg/data/archivedir
REPL_USER=postgres
PCP_USER=postgres
PGPOOL_PATH=/usr/local/bin
PCP_PORT=9898
# Recovery the slave from the new primary
logger -i -p local1.info follow_master.sh: start: synchronize the Standby node [email protected]${FAILED_NODE_HOST} with the new Primary node [email protected]${NEW_MASTER_NODE_HOST}
## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected]${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa ls /tmp > /dev/null
if [ $? -ne 0 ]; then
logger -i -p local1.error follow_master.sh: passwrodless SSH to postgr[email protected]${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.
exit 1
fi
## Get PostgreSQL major version
PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`
if [ ${PGVERSION} -ge 12 ]; then
RECOVERYCONF=${FAILED_NODE_PGDATA}/myrecovery.conf
else
RECOVERYCONF=${FAILED_NODE_PGDATA}/recovery.conf
fi
# Check the status of standby
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
[email protected]${FAILED_NODE_HOST} -i ~/.ssh/id_rsa ${PGHOME}/bin/pg_ctl -w -D ${FAILED_NODE_PGDATA} status
## If Standby is running, run pg_basebackup.
if [ $? -eq 0 ]; then
# Execute pg_basebackup
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected]${FAILED_NODE_HOST} -i ~/.ssh/id_rsa "
set -o errexit
${PGHOME}/bin/pg_ctl -w -m f -D ${FAILED_NODE_PGDATA} stop
rm -rf ${FAILED_NODE_PGDATA}
rm -rf ${ARCHIVEDIR}/*
${PGHOME}/bin/pg_basebackup -h ${NEW_MASTER_NODE_HOST} -U ${REPL_USER} -p ${NEW_MASTER_NODE_PORT} -D ${FAILED_NODE_PGDATA} -X stream
if [ ${PGVERSION} -ge 12 ]; then
sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
-e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${FAILED_NODE_PGDATA}/postgresql.conf
fi
cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPL_USER} passfile=''/home/postgres/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
EOT
if [ ${PGVERSION} -ge 12 ]; then
touch ${FAILED_NODE_PGDATA}/standby.signal
else
echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
fi
"
if [ $? -ne 0 ]; then
logger -i -p local1.error follow_master.sh: end: pg_basebackup failed
exit 1
fi
# start Standby node on ${FAILED_NODE_HOST}
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
[email protected]${FAILED_NODE_HOST} -i ~/.ssh/id_rsa $PGHOME/bin/pg_ctl -l /dev/null -w -D ${FAILED_NODE_PGDATA} start
# If start Standby successfully, attach this node
if [ $? -eq 0 ]; then
# Run pcp_attact_node to attach Standby node to Pgpool-II.
${PGPOOL_PATH}/pcp_attach_node -w -h localhost -U $PCP_USER -p ${PCP_PORT} -n ${FAILED_NODE_ID}
if [ $? -ne 0 ]; then
logger -i -p local1.error follow_master.sh: end: pcp_attach_node failed
exit 1
fi
# If start Standby failed, drop replication slot "${FAILED_NODE_HOST}"
else
logger -i -p local1.error follow_master.sh: end: follow master command failed
exit 1
fi
else
logger -i -p local1.info follow_master.sh: failed_nod_id=${FAILED_NODE_ID} is not running. skipping follow master command
exit 0
fi
logger -i -p local1.info follow_master.sh: end: follow master command complete
exit 0
授权
chmod +x /usr/local/pgpool/follow_master.sh
11 编写recovery_1st_stage脚本
pcp_recovery_node会用到recovery_1st_stage和pgpool_remote_start。
recovery_1st_stage会使用pg_basebackup从现存的主节点同步数据,初始化数据库,更新recovery配置文件。
vi /home/pg/data/recovery_1st_stage.sh
编辑
#!/bin/bash
# This script is executed by "recovery_1st_stage" to recovery a Standby node.
set -o xtrace
logger -i -p local1.info recovery_1st_stage begin
PRIMARY_NODE_PGDATA="$1"
DEST_NODE_HOST="$2"
DEST_NODE_PGDATA="$3"
PRIMARY_NODE_PORT="$4"
DEST_NODE_PORT=5432
# 这里是hostname变量,所以需要配置DNS
PRIMARY_NODE_HOST=$(hostname)
PGHOME=/usr/local/pgsql
ARCHIVEDIR=/home/pg/data/archivedir
REPL_USER=postgres
logger -i -p local1.info recovery_1st_stage: start: pg_basebackup for Standby node [email protected]{$DEST_NODE_HOST}
## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected]${DEST_NODE_HOST} -i ~/.ssh/id_rsa ls /tmp > /dev/null
if [ $? -ne 0 ]; then
logger -i -p local1.error recovery_1st_stage: passwrodless SSH to [email protected]${DEST_NODE_HOST} failed. Please setup passwrodless SSH.
exit 1
fi
## Get PostgreSQL major version
PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`
if [ $PGVERSION -ge 12 ]; then
RECOVERYCONF=${DEST_NODE_PGDATA}/myrecovery.conf
else
RECOVERYCONF=${DEST_NODE_PGDATA}/recovery.conf
fi
## Execute pg_basebackup to recovery Standby node
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected]$DEST_NODE_HOST -i ~/.ssh/id_rsa "
set -o errexit
rm -rf $DEST_NODE_PGDATA
rm -rf $ARCHIVEDIR/*
${PGHOME}/bin/pg_basebackup -h ${PRIMARY_NODE_HOST} -U ${REPL_USER} -p ${PRIMARY_NODE_PORT} -D ${DEST_NODE_PGDATA} -X stream
if [ ${PGVERSION} -ge 12 ]; then
sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
-e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${DEST_NODE_PGDATA}/postgresql.conf
fi
cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${PRIMARY_NODE_HOST} port=${PRIMARY_NODE_PORT} user=${REPL_USER} passfile=''/home/postgres/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${PRIMARY_NODE_HOST}:${ARCHIVEDIR}/%f %p'
EOT
if [ ${PGVERSION} -ge 12 ]; then
touch ${DEST_NODE_PGDATA}/standby.signal
else
echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
fi
sed -i \"s/#*port = .*/port = ${DEST_NODE_PORT}/\" ${DEST_NODE_PGDATA}/postgresql.conf
"
if [ $? -ne 0 ]; then
logger -i -p local1.error recovery_1st_stage: end: pg_basebackup failed. online recovery failed
exit 1
fi
logger -i -p local1.info recovery_1st_stage: end: recovery_1st_stage complete
exit 0
授权
chmod +x /home/pg/data/recovery_1st_stage.sh
12 编写pgpool_remote_start脚本(注意没有sh扩展名)
pgpool_remote_start负责启动数据库
vi /home/pg/data/pgpool_remote_start
编辑
#!/bin/bash
# This script is run after recovery_1st_stage to start Standby node.
set -o xtrace
logger -i -p local1.info pgpool_remote_start begin
PGHOME=/usr/local/pgsql
DEST_NODE_HOST="$1"
DEST_NODE_PGDATA="$2"
logger -i -p local1.info pgpool_remote_start: start: remote start Standby node [email protected]$DEST_NODE_HOST
## Test passwrodless SSH
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected]${DEST_NODE_HOST} -i ~/.ssh/id_rsa ls /tmp > /dev/null
if [ $? -ne 0 ]; then
logger -i -p local1.error pgpool_remote_start: passwrodless SSH to [email protected]${DEST_NODE_HOST} failed. Please setup passwrodless SSH.
exit 1
fi
## Start Standby node
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected]$DEST_NODE_HOST -i ~/.ssh/id_rsa "
$PGHOME/bin/pg_ctl -l /dev/null -w -D $DEST_NODE_PGDATA start
"
if [ $? -ne 0 ]; then
logger -i -p local1.error pgpool_remote_start: [email protected]$DEST_NODE_HOST start failed.
exit 1
fi
logger -i -p local1.info pgpool_remote_start: end: [email protected]$DEST_NODE_HOST started successfully.
exit 0
授权
chmod +x /home/pg/data/pgpool_remote_start.sh
13 配置pgpool.conf
复制Pgpool配置模板文件
cp /usr/local/etc/pgpool.conf.sample-stream /usr/local/etc/pgpool.conf
vi /usr/local/etc/pgpool.conf
修改
#CONNECTIONS
listen_addresses = '*'
port = 9999
pcp_listen_addresses = '*'
pcp_port = 9898
#- Backend Connection Settings -
backend_hostname0 ='【集群内第1个节点的IP,多个节点配置一致】'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/home/pg/data'
backend_flag0 = ‘ALLOW_TO_FAILOVER’
backend_hostname1 ='【集群内第2个节点的IP,多个节点配置一致】'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/home/pg/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_hostname2 ='【集群内第3个节点的IP,多个节点配置一致】'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/home/pg/data'
backend_flag2 = 'ALLOW_TO_FAILOVER'
#- Authentication -
enable_pool_hba = on
pool_passwd ='pool_passwd'
#------------------------------------------------------------------------------
# POOLS
#------------------------------------------------------------------------------
# - Concurrent session and pool size -
num_init_children = 400
max_pool = 4
#------------------------------------------------------------------------------
# LOGS
#------------------------------------------------------------------------------
log_destination = 'syslog'
syslog_facility = 'LOCAL1'
#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------
pid_file_name = '/usr/local/pgpool/pgpool.pid'
#------------------------------------------------------------------------------
# REPLICATION MODE
#------------------------------------------------------------------------------
replication_mode = off
#------------------------------------------------------------------------------
# LOAD BALANCING MODE
#------------------------------------------------------------------------------
load_balance_mode = on
#------------------------------------------------------------------------------
# MASTER/SLAVE MODE
#------------------------------------------------------------------------------
master_slave_mode = on
master_slave_sub_mode = 'stream'
# - Streaming -
sr_check_period = 5
sr_check_user = 'postgres'
sr_check_password ='123456'
sr_check_database = 'postgres'
follow_master_command = '/usr/local/pgpool/follow_master.sh %d %h %p %D %m %H %M %P %r %R'
#------------------------------------------------------------------------------
#HEALTH CHECK
#------------------------------------------------------------------------------
health_check_period = 10
health_check_timeout = 20
health_check_user = 'postgres'
health_check_password ='123456'
health_check_database = 'postgres'
#------------------------------------------------------------------------------
#FAILOVER AND FAILBACK
#------------------------------------------------------------------------------
# 注意failover_stream.sh后的参数,要与failover_stream.sh中的脚本匹配
failover_command = '/usr/local/pgpool/failover_stream.sh %d %h %p %D %m %H %M %P %r %R'
#------------------------------------------------------------------------------
#WATCHDOG
#------------------------------------------------------------------------------
use_watchdog = on
wd_hostname = '【本机IP】'
wd_port = 9000
delegate_IP = '【虚拟IP】'
if_cmd_path = '/sbin'
if_up_cmd = 'ip addr add $_IP_$/24 dev 【网卡名称(通过ip addr查看)】 label 【网卡名称(通过ip addr查看)】:0'
if_down_cmd = 'ip addr del $_IP_$/24 dev 【网卡名称(通过ip addr查看)】'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U $_IP_$ -w 1'
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = '【其他节点1的IP】'
heartbeat_destination_port0 = 9694
heartbeat_device0 = '【网卡名称(通过ip addr查看)】'
heartbeat_destination1 = '【其他节点2的IP】'
heartbeat_destination_port1 = 9694
heartbeat_device1 = '【网卡名称(通过ip addr查看)】'
#------------------------------------------------------------------------------
# ONLINE RECOVERY
#------------------------------------------------------------------------------
recovery_user = 'postgres'
recovery_password = '123456'
recovery_1st_stage_command = 'recovery_1st_stage.sh'
#- Other pgpool Connection Settings -
other_pgpool_hostname0 ='【其他节点1的IP】'
other_pgpool_port0 = 9999
other_wd_port0 = 9000
other_pgpool_hostname1 ='【其他节点2的IP】'
other_pgpool_port1 = 9999
other_wd_port1 = 9000
14 生成pool_passwd
/usr/local/bin/pg_md5 -p --md5auth --username=postgres pool_passwd
chown postgres:postgres /usr/local/etc/pool_passwd
操作完成后,在/usr/local/etc目录下会生成一个pool_passwd文件,该文件在pgpool.conf中通过“pool_passwd”选项配置。
15 配置pool_hba.conf
客户端认证配置文件,在pgpool.conf中配置“enable_pool_hba = on”时,将使用该文件认证客户端连接。
root用户执行
cp /usr/local/etc/pool_hba.conf.sample /usr/local/etc/pool_hba.conf
chown postgres:postgres /usr/local/etc/pool_hba.conf
vi /usr/local/etc/pool_hba.conf
修改
local all all md5
host all all 0.0.0.0/0 md5
host all all 0/0 md5
16 配置pcp.conf
Pgpool-II为管理员提供了执行管理操作的接口,例如获取Pgpool-II状态或远程终止Pgpool-II进程。pcp.conf是此接口用于身份验证的用户名/密码文件。所有操作模式都需要设置pcp.conf文件。数据库中使用PCP进行健康检测和复制延迟检测的用户,其用户名和密码必须在pcp.conf中声明。
16.1 生成密码摘要
/usr/local/bin/pg_md5 【密码】
16.2 生成和配置文件
cp /usr/local/etc/pcp.conf.sample /usr/local/etc/pcp.conf
vi /usr/local/etc/pcp.conf
编辑
# Example:
postgres:【上一步生成的密码摘要】
17 设置PCP密码配置文件
由于follow_master_command脚本必须在不输入密码的情况下执行PCP命令,所以我们在Pgpool-II启动用户的主目录中创建.pcppass。
echo 'localhost:9898:【数据库用户名称】:【数据库用户密码】' > ~/.pcppass
chmod 600 ~/.pcppass
18 允许postgres用户以root身份执行相关命令
root用户执行
chmod u+s /sbin/ip
chmod u+s /sbin/arping
19 配置开机自启
vi /etc/systemd/system/pgpool.service
编辑
[Unit]
Description=Pgpool-II
After=syslog.target network.target
[Service]
Type=forking
User=root
ExecStart=/usr/local/bin/pgpool -C -D
ExecStop=/usr/local/bin/pgpool -m fast stop
ExecReload=/usr/local/bin/pgpool reload
LimitNOFILE=65536
KillMode=process
KillSignal=SIGINT
Restart=on-abnormal
RestartSec=30s
TimeoutSec=0
[Install]
WantedBy=multi-user.target
开机自启
systemctl enable pgpool
20 在所有节点启动pgpool
systemctl start pgpool
21 通过pcp_recovery_node初始化备节点数据库
pcp_recovery_node会先在primary节点执行SELECT pgpool_recovery(‘recovery_1st_stage’, ‘【需要启动的standby的IP】’, ‘【数据目录】’, ‘5432’, 【节点编号】),可以通过查询PG日志查看异常信息;然后执行recovery_1st_stage.sh和pgpool_remote_start.sh
pcp_recovery_node -h 【主节点的IP】 -p 9898 -U 【PCP用户名(数据库)】 -n 【node-id】
22 查看节点状态
22.1 Pgpool节点状态
pcp_watchdog_info -h 【虚拟IP】 -p 9898 -U 【Pgpool用户】
22.2 数据库节点状态
通过虚拟IP的9999端口开启PostgreSQL数据库命令行,执行
注意:如果使用navicat的命令行功能,可能需要关闭并重启命令行界面(建立新的session)才能看到正确的节点状态
23 高可用测试
23.1 Pgpool 高可用测试
- master停止pgpool服务,某个standby接管master服务
- 重启之前停止的master的pgpool服务,节点状态变为standby
- standby停止pgpool服务,节点状态变为shutdown
- 重启standby,节点状态恢复为standby
23.2 数据库高可用测试
-
primary停止数据库服务,某个standby晋升为primary,其他standby执行follow
master,从新的primary同步数据
- 执行pcp_recovery_node,将之前停止服务的primary重启
- standby停止数据库服务,其他节点无变化
- 执行pcp_recovery_node,将之前停止服务的standby重启
24 负载均衡测试
使用pgbench进行负载均衡测试
24.1 数据初始化
pgbench -i -h 【主节点IP】 -p 5432 postgres
24.2 测试
```shell
pgbench -c 【客户端数量】 -T 【持续时间(秒)】 -h 【集群虚拟IP】 postgres -p 9999 -U postgres -S(表示只读测试)
24.3 使用show pool_nodes查看查询请求在各节点的负载
25 非正常关闭pgpool后的故障恢复
- 非正常关闭pgpool,要清理缓存文件:
rm /tmp/.s.PGSQL.9999
rm /tmp/.s.PGPOOLWD_CMD.9000
rm /tmp/.s.PGSQL.9898
-
如果启动第一个节点状态为standby,重启pgpool并使用
pcp_attach_node加入该节点