天天看点

Pacemaker+corosync搭建PostgreSQL共享存储HA 1.概述 2.总体设计 3. 网络环境配置 4.集群软件的选择与安装 5. Corosync的配置   5.共享存储的安装与配置 6. PostgreSQL的安装和配置 7. 配置Pacemaker 8.验证测试   9. 总结 10. 参考1:几个常用命令   11.参考2:相关链接

关于PostgreSQL的HA有很多种解决方案,各有利弊,具体可参考PostgreSQL官方手册。本文介绍共享存储HA的搭建方法。

<a href="http://www.postgresql.org/docs/9.3/static/different-replication-solutions.html">http://www.postgresql.org/docs/9.3/static/different-replication-solutions.html</a>

测试环境由3台VMWare虚拟机构成。其中1台服务器作为共享存储服务器使用并且凑足3个节点以满足法定投票数。

使用external/ssh作为测试用的fence设备,生产环境务必要使用物理fence设备。

共享存储服务器

OS:CentOS release 6.3 (Final)

主机名:disknode

网卡1:

  用于集群内部消息的通信

  网卡类型:NAT

  IP:192.168.152.120

网卡2:

  用于共享盘的iscsi通信

  网卡类型:Host-Only

  IP:192.168.146.120

网卡3:

  用于external/ssh fence设备通信

  网卡类型:桥接

  IP:10.167.217.107

HA节点1

OS:CentOS release 6.5 (Final)

主机名:hanode1

  用于集群公开IP(192.168.152.200)和集群内部消息的通信

  IP:192.168.152.130

  IP:192.168.146.130

  IP:10.167.217.169

HA节点2

OS:CentOS release 6.5 (Final)

主机名:hanode2

  IP:192.168.152.140

  IP:192.168.146.140

  IP:10.167.217.171

集群公开IP

  192.168.152.200

每台虚拟机上准备好可配置在不同网段的3块网卡,这里分别使用了NAT,Host-Only和桥接3种不同类型的虚拟网卡。

网卡1(NAT)用作集群公开IP和集群内部消息的心跳通信。

网卡2(Host-Only)用于共享盘的iscsi通信

网卡3(桥接)用于external/ssh fence设备通信

按照[2.总体设计],分别给3个主机设置主机名和IP地址。比如对HA节点1

[root@hanode1 ~]# cat /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=hanode1

GATEWAY=192.168.152.2

[root@hanode1 ~]# source /etc/sysconfig/network

[root@hanode1 ~]# ifconfig eth0 192.168.152.130  netmask 255.255.255.0

[root@hanode1 ~]# ifconfig eth1 192.168.146.130 netmask 255.255.255.0

[root@hanode1 ~]# ifconfig eth2 10.167.217.169  netmask 255.255.255.0

把3个网卡ip地址分别设置到相应的配置文件,比如eth0

[root@hanode1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0

TYPE=Ethernet

IPADDR=192.168.152.130

NETMASK=255.255.255.0

NETWORK=192.168.152.0

BROADCAST=192.168.152.255

STARTMODE=onboot

USERCONTROL=no

将网卡3的IP地址记录到3个主机的/etc/hosts中,external/ssh会使用这个IP。

$ cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.167.217.169   hanode1 hanode1.localdomain

10.167.217.171   hanode2 hanode2.localdomain

10.167.217.107    disknode diknode.localdomain

建立SSH互信

在hanode1上执行以下命令

[root@hanode1 ~]# cd ~

[root@hanode1 ~]# ssh-keygen -t dsa -f ~/.ssh/id_dsa -N ""

[root@hanode1 ~]# cp .ssh/id_dsa.pub .ssh/authorized_keys

[root@hanode1 ~]# scp -r .ssh hanode2:

[root@hanode1 ~]# scp -r .ssh disknode:

经过比较,最终在适合做数据库集群的几个候选中选择了Pacemaker + corosync 1.x。

RHCS

heartbeat 1.x

Pacemaker + heartbeat 2.x/3.x

Pacemaker + cman

Pacemaker + corosync 1.x

Pacemaker + corosync 2.x

理由:

RHCS据说过于复杂,先不考虑。

heartbeat已处于维护阶段没更新了,而且也不受主流Linux发行版支持了,不考虑。

根据Pacemaker的官网说明,Pacemaker + corosync 2.x是今后的方向,因此锁定corosync。但没找到corosync 2.x的yum包,也不想编译源码那么麻烦了,于是先用corosync 1.x。

<a href="http://clusterlabs.org/quickstart.html">http://clusterlabs.org/quickstart.html</a>

----------

Eventually everyone will move to Corosync 2 which removes support for CMAN and custom plugins BUT natively includes the APIs Pacemaker needs for quorum and membership. In this case, users would configure corosync.conf and use the Pacemaker init-script to start up after Corosync.

pacemaker和corosync的安装

centos自带的源里resource-agents等软件太旧,所以使用opensuse的源安装pacemaker和corosync

cp network:ha-clustering:Stable.repo /etc/yum.repos.d/

yum install pacemaker-1.1.9 corosync crmsh

但是这个版本的pacemaker编译时没有支持heartbeat项目里的fence设备(包括external/ssh),需要重新编译pacemaker。

[root@hanode1 ~]# pacemakerd -F

Pacemaker 1.1.9-2.7 (Build: 2db99f1)

 Supporting v3.0.7:  generated-manpages agent-manpages ascii-docs ncurses libqb-logging libqb-ipc nagios  corosync-plugin

下载pacemaker 1.1.9的代码,解压

[root@hanode1 ~]# tar xfz Pacemaker-1.1.9.tar.gz

使用yum安装以下依赖包,特别是最后2个。

automake

autoconf

libtool-ltdl-devel

pkgconfig

python

glib2-devel

libxml2-devel

libxslt-devel

python-devel

gcc-c++

bzip2-devel

gnutls-devel

pam-devel

libqb-devel

corosynclib-devel

cluster-glue-libs-devel

编译安装Pacemaker

[root@hanode1 ~]# pacemaker-Pacemaker-1.1.9/

[root@hanode1 ~]# ./autogen.sh

[root@hanode1 ~]# ./configure

[root@hanode1 ~]# make

[root@hanode1 ~]# sudo make install

编译时如发现有包缺失,需要装上再执行编译。

编译中如果遇到下面的错误,把出错的2行注掉(这2个服务后面不会用到)。

upstart.c:264: error: implicit declaration of function 'g_variant_lookup_value'

upstart.c:264: error: nested extern declaration of 'g_variant_lookup_value'

upstart.c:264: error: assignment makes pointer from integer without a cast

...

systemd.c:227: error: implicit declaration of function 'g_variant_lookup_value'

systemd.c:227: error: nested extern declaration of 'g_variant_lookup_value'

systemd.c:227: error: assignment makes pointer from integer without a cast

再用pacemakerd -F查看,已经包含了lha-fencing了。

Pacemaker 1.1.9 (Build: 2a917dd)

 Supporting:  libqb-logging libqb-ipc lha-fencing upstart systemd nagios  heartbeat corosync-plugin cman snmp

3台主机使用相同配置。bindnetaddr设置为网卡1的网络地址。

[root@hanode1 ~]# cat /etc/corosync/corosync.conf

# Please read the corosync.conf.5 manual page

compatibility: whitetank

totem {

version: 2

secauth: off

threads: 0

interface {

ringnumber: 0

bindnetaddr: 192.168.152.0

mcastaddr: 226.94.1.1

mcastport: 5405

ttl: 1

}

logging {

fileline: off

to_stderr: no

to_logfile: yes

to_syslog: yes

logfile: /var/log/cluster/corosync.log

debug: off

timestamp: on

logger_subsys {

subsys: AMF

amf {

mode: disabled

service {

ver: 0

name: pacemaker

clustername: mycluster

aisexec {

user: root

group: root

[root@hanode1 ~]# /etc/init.d/corosync start

[root@hanode1 ~]# chkconfig corosync on

5.1 在disknode上配置iSCSI targer server

安装iSCSI target

[root@disknode ~]# yum -y install scsi-target-utils

在targets.conf中追加target,将/dev/sdb作为共享磁盘。

[root@disknode ~]# vi /etc/tgt/targets.conf

backing-store /dev/sdb

write-cache off

启动tgtd

[root@disknode ~]# service tgtd  start

[root@disknode ~]# chkconfig tgtd on

5.2 在hanode1和hanode2上配置iSCSI客户端

安装iscsi initiator(以hanode1为例,hanode2相同)

[root@hanode1 ~]# yum install iscsi-initiator-utils-*.rpm

启动iscsid

[root@hanode1 ~]# service iscsid start

[root@hanode1 ~]# chkconfig iscsid on

发现和登录target会话

[root@hanode1 ~]# iscsiadm -m discovery -t sendtargets -p 192.168.146.120

[root@hanode1 ~]# iscsiadm -m node -T iqn.2008-09.com.example:server.target1 -p 192.168.146.120 -l

iscsi共享盘加载到客户端是/dev/sdc

在hanode1和hanode2上安装PostgreSQL,并确保2台机器上postgres账号的id相同。

[root@hanode1 ~]# yum install postgresql93-server postgresql93-contrib

在hanode1上初始化数据库

[root@hanode1 ~]# mkdir -p /mnt/pg

[root@hanode1 ~]# chown postgres:postgres /mnt/pg

[root@hanode1 ~]# mount  /dev/sdc /mnt/pg

[root@hanode1 ~]# mv /var/lib/pgsql /var/lib/pgsql_org

[root@hanode1 ~]# ln -sf /mnt/pg/pgsql /var/lib/pgsql

[root@hanode1 ~]# mkdir -p /var/lib/pgsql/9.3/backups

[root@hanode1 ~]# mkdir -p /var/lib/pgsql/9.3/data

[root@hanode1 ~]# chown postgres:postgres -R /var/lib/pgsql/9.3

[root@hanode1 ~]# service postgresql-9.3 initdb

修改数据库配置

[root@hanode1 ~]# vi /var/lib/pgsql/9.3/data/postgresql.conf

listen_addresses = '*'

port = 5432

[root@hanode1 ~]# vi /var/lib/pgsql/data/pg_hba.conf

local all all trust

host all all all trust

在hanode2上建立数据目录的链接

[root@hanode2 ~]# mkdir -p /mnt/pg

[root@hanode2 ~]# chown postgres:postgres /mnt/pg

[root@hanode2 ~]# mv /var/lib/pgsql /var/lib/pgsql_org

[root@hanode2 ~]# ln -sf /mnt/pg/pgsql /var/lib/pgsql

[root@hanode1 ~]# crm configure edit 

node disknode

node hanode1

node hanode2

primitive ClusterIP IPaddr2 \

       params ip=192.168.152.200 cidr_netmask=32 \

       op monitor interval=30s

primitive DataFS Filesystem \

       params device="/dev/sdc" directory="/mnt/pg" fstype=ext4 \

       op monitor interval=15s

primitive pg93 pgsql \

       meta target-role=Started is-managed=true \

primitive rs_iscsid lsb:iscsid \

       op monitor interval=30s \

       meta target-role=Started

primitive st-ssh stonith:external/ssh \

       params hostlist="hanode1,hanode2"

group PgGroup ClusterIP rs_iscsid DataFS pg93

clone st-sshclone st-ssh

location no_iscsid rs_iscsid -inf: disknode

location votenode ClusterIP -inf: disknode

property cib-bootstrap-options: \

       dc-version=1.1.9-2a917dd \

       cluster-infrastructure="classic openais (with plugin)" \

       expected-quorum-votes=3 \

       stonith-enabled=true \

       no-quorum-policy=stop \

       last-lrm-refresh=1402882416

#vim:set syntax=pcmk

查看集群状态(当前hanode2为主节点)

[root@hanode1 ~]# crm status

Last updated: Wed Sep  3 02:28:02 2014

Last change: Wed Sep  3 01:18:27 2014 via cibadmin on hanode1

Stack: classic openais (with plugin)

Current DC: disknode - partition with quorum

Version: 1.1.9-2a917dd

3 Nodes configured, 3 expected votes

7 Resources configured.

Online: [ disknode hanode1 hanode2 ]

 Resource Group: PgGroup

     ClusterIP (ocf::heartbeat:IPaddr2): Started hanode2

     rs_iscsid (lsb:iscsid): Started hanode2

     DataFS (ocf::heartbeat:Filesystem): Started hanode2

     pg93 (ocf::heartbeat:pgsql): Started hanode2

 Clone Set: st-sshclone [st-ssh]

     Started: [ disknode hanode1 hanode2 ]

为确保状态正常,可以在3个节点上重启一下corosync

/etc/init.d/corosync restart

通过集群IP可以访问主服务器hanode2上的PG服务

[root@disknode ~]# psql -h 192.168.152.200 -U postgres

psql (9.3.4)

Type "help" for help.

1)重置failcount

每次切换试验前重置前面的测试产生的failcount,否则Pacemaker在ziyuan分配时会屏蔽这个节点。还有一个偷懒的方法是设置pg93的属性“failure-timeout=60s”使其在60秒后让failcount自动失效。

[root@hanode1 ~]# crm resource pg93 set hanode1 0

[root@hanode1 ~]# crm_failcount -D -N hanode1 -r pg93

[root@hanode1 ~]# crm resource cleanup pg93

2)强制杀PG进程

在主服务器hanode2上杀死PG进程后,一会PG进程会被pacemaker在hanode2上自动起来

[root@hanode2 ~]# ps -ef|grep postgres

root      8910  8795  0 02:38 pts/1    00:00:00 su - postgres

postgres  8911  8910  0 02:38 pts/1    00:00:00 -bash

postgres 14555     1  0 02:56 ?        00:00:00 /usr/pgsql-9.3/bin/postgres -D /var/lib/pgsql/data -c config_file=/var/lib/pgsql/data/postgresql.conf

postgres 14597 14555  0 02:56 ?        00:00:00 postgres: logger process                                                                            

postgres 14599 14555  0 02:56 ?        00:00:00 postgres: checkpointer process                                                                      

postgres 14600 14555  0 02:56 ?        00:00:00 postgres: writer process                                                                            

postgres 14601 14555  0 02:56 ?        00:00:00 postgres: wal writer process                                                                        

postgres 14602 14555  0 02:56 ?        00:00:00 postgres: autovacuum launcher process                                                               

postgres 14603 14555  0 02:56 ?        00:00:00 postgres: stats collector process                                                                   

root     15013  9121  0 02:57 pts/0    00:00:00 grep postgres

[root@hanode2 ~]# kill -9 14555

root     15113  9121  0 02:57 pts/0    00:00:00 grep postgres

postgres 15294     1  0 02:57 ?        00:00:00 /usr/pgsql-9.3/bin/postgres -D /var/lib/pgsql/data -c config_file=/var/lib/pgsql/data/postgresql.conf

postgres 15336 15294  0 02:57 ?        00:00:00 postgres: logger process                                                                            

postgres 15338 15294  0 02:57 ?        00:00:00 postgres: checkpointer process                                                                      

postgres 15339 15294  0 02:57 ?        00:00:00 postgres: writer process                                                                            

postgres 15340 15294  0 02:57 ?        00:00:00 postgres: wal writer process                                                                        

postgres 15341 15294  0 02:57 ?        00:00:00 postgres: autovacuum launcher process                                                               

postgres 15342 15294  0 02:57 ?        00:00:00 postgres: stats collector process                                                                   

root     15344  9121  0 02:57 pts/0    00:00:00 grep postgres

3)切断主服务器心跳网卡(兼业务网卡)

切断前hanode1是主服务器

Last updated: Wed Sep  3 03:49:23 2014

Last change: Wed Sep  3 02:39:49 2014 via cibadmin on hanode1

     ClusterIP (ocf::heartbeat:IPaddr2): Started hanode1

     rs_iscsid (lsb:iscsid): Started hanode1

     DataFS (ocf::heartbeat:Filesystem): Started hanode1

     pg93 (ocf::heartbeat:pgsql): Started hanode1

切断hanode1网卡后,hanode1服务器被重启,hanode2接管服务

[root@hanode2 ~]# crm status

Last updated: Wed Sep  3 03:57:13 2014

4)切断主服务器共享磁盘网卡

切断后主服务器hanode2共享磁盘网卡后,15秒后,pacemaker检出故障,然后切换到从服务器hanode1。(命令输出略)

5)停止主服务器集群服务

[root@hanode2 ~]# /etc/init.d/corosync stop

Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]

Waiting for corosync services to unload:..                 [  OK  ]

很快从服务器接管服务(命令输出略)

6)切断主服务器心跳网卡和fencing网卡

中段主服务器hanode2所有网卡后,集群启动hanode1,可惜卡在PG的启动上。重启corosync后解决。

这个Case也有问题,但我们如果假设fencing设备是不会失效的,就不用考虑这个Case了。

30秒后pacemaker误以为fencing成功了,切换到hanode1

/var/log/messages

Sep  3 10:59:11 hanode1 stonith-ng[14464]:   notice: handle_request: Client crmd.14468.e4823c4b wants to fence (reboot) 'hanode2' with device '(any)'

Sep  3 10:59:11 hanode1 stonith-ng[14464]:   notice: initiate_remote_stonith_op: Initiating remote operation reboot for hanode2: 7ce22e99-046c-4bfe-a4cd-02033b646a93 (0)

Sep  3 10:59:31 hanode1 crmd[14468]:   notice: tengine_stonith_callback: Stonith operation 7/31:82:0:385d8cd2-e2ac-4f62-8e04-f9c7dcfff7ba: OK (0)

Sep  3 10:59:31 hanode1 crmd[14468]:   notice: tengine_stonith_notify: Peer hanode2 was terminated (st_notify_fence) by hanode1 for disknode: OK (ref=7ce22e99-046c-4bfe-a4cd-02033b646a93) by client crmd.14468

Sep  3 10:59:31 hanode1 crmd[14468]:   notice: te_rsc_command: Initiating action 6: start ClusterIP_start_0 on hanode1

后面的日志有一堆错误,但用crm status查看切换还是成功了,psql也可以连接。登上hanode2执行df发现hanode2上的共享磁盘也没有了,postgres进程也没有了。也就是说切换的很干净。(这是由于hanode2运行的pacemaker发现自己被孤立了,达不到法定票数,于是把自己的资源都停掉了)

Last updated: Wed Sep  3 12:13:55 2014

Last change: Wed Sep  3 10:39:37 2014 via cibadmin on hanode1

Current DC: hanode1 - partition with quorum

Online: [ disknode hanode1 ]

OFFLINE: [ hanode2 ]

     Started: [ disknode hanode1 ]

     Stopped: [ st-ssh:2 ]

再打开hanode2的网卡,hanode2又上线了。

Last updated: Wed Sep  3 12:22:24 2014

Current DC: hanode2 - partition with quorum

7)切断主服务器上所有网卡

结果和6相同。

8)主服务器宕机

主服务器宕机时切换很正常。(命令输出略)

试验证明Pacemaker+corosync的组合组数据库集群控制确实很实用。HA集群害怕两个东西,首先是脑裂导致数据破坏,其次是从机不能正常切换(可能部分资源被故障机占用)。通过组成集群的3个节点的投票,基本可以杜绝脑裂,加上fencing确保故障机器上的资源释放,可靠性很高了。如果只有两个节点,可靠性有所下降,这时就完全靠fencing了。

1)查询获得可用的资源类

[root@hanode2 ~]# crm ra classes

lsb

ocf / heartbeat linbit pacemaker redhat tummy

service

stonith

upstart

2)列出某个资源类中所有的资源脚本

[root@hanode2 ~]# crm ra list ocf pacemaker

ClusterMon         Dummy              HealthCPU          HealthSMART

NodeUtilization    Stateful           SysInfo            SystemHealth

controld           o2cb               ping               pingd

remote 

3)查看某个资源脚本的元数据

[root@hanode2 ~]# crm ra meta ocf:heartbeat:Filesystem

4)查询已安装的fence设备

[root@hanode2 ~]# stonith_admin -I

5)查询fence设备的元数据

[root@hanode2 ~]# stonith_admin --metadata -a external/ssh

<a href="http://www.linux-ha.org/wiki/Pacemaker">http://www.linux-ha.org/wiki/Pacemaker</a>

<a href="http://www.linux-ha.org/wiki/Resource_Agents">http://www.linux-ha.org/wiki/Resource_Agents</a>

<a href="http://clusterlabs.org/wiki/Main_Page">http://clusterlabs.org/wiki/Main_Page</a>

<a href="http://clusterlabs.org/doc/zh-CN/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html">http://clusterlabs.org/doc/zh-CN/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html</a>

<a href="http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html">http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html</a>

<a href="http://clusterlabs.org/doc/crm_fencing.html">http://clusterlabs.org/doc/crm_fencing.html</a>

<a href="https://github.com/ClusterLabs/pacemaker">https://github.com/ClusterLabs/pacemaker</a>

继续阅读