天天看點

corosync+pacemaker+crmsh的高可用web叢集的實作

網絡規劃:

node1:eth0:172.16.31.10/16

node2: eth0: 172.16.31.11/16

nfs:   eth0: 172.16.31.12/15

注:

nfs在提供NFS服務的同時是一台NTP伺服器,可以讓node1和node2同步時間的。

node1和node2之間心跳資訊傳遞依靠eth0傳遞

web伺服器的VIP是172.16.31.166/16

架構圖:跟前文的架構一樣,隻是節點上安裝的高可用軟體不一緻:

<a href="http://s3.51cto.com/wyfs02/M02/58/05/wKiom1Snt9HSxaRCAAIlf-VMcmY056.jpg" target="_blank"></a>

一.高可用叢集建構的前提條件

1.主機名互相解析,實作主機名通信

[root@node1 ~]# vim /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

172.16.31.10 node1.stu31.com node1

172.16.31.11 node2.stu31.com node2

複制一份到node2:

[root@node1 ~]# scp /etc/hosts [email protected]:/etc/hosts

2.節點直接實作ssh無密鑰通信

節點1:

[root@node1 ~]# ssh-keygen -t rsa -P ""

[root@node1 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@node2

節點2:

[root@node2 ~]# ssh-keygen -t rsa -P ""

[root@node2 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@node1

測試:

[root@node2 ~]# date ; ssh node1 'date'

Fri Jan  2 05:46:54 CST 2015

時間同步成功!注意時間必須一緻!

ntp伺服器建構參考:http://sohudrgon.blog.51cto.com/3088108/1598314

二.叢集軟體安裝及配置

1.安裝corosync和pacemaker軟體包:節點1和節點2都安裝

# yum install corosync pacemaker -y

2.建立配置檔案并配置

[root@node1 ~]# cd /etc/corosync/

[root@node1 corosync]# cp corosync.conf.example corosync.conf

[root@node1 corosync]# cat corosync.conf

# Please read the corosync.conf.5 manual page

compatibility: whitetank

totem {

        version: 2

        # secauth: Enable mutual node authentication. If you choose to

        # enable this ("on"), then do remember to create a shared

        # secret with "corosync-keygen".

#開啟認證

        secauth: on

        threads: 0

        # interface: define at least one interface to communicate

        # over. If you define more than one interface stanza, you must

        # also set rrp_mode.

        interface {

                # Rings must be consecutively numbered, starting at 0.

                ringnumber: 0

                # This is normally the *network* address of the

                # interface to bind to. This ensures that you can use

                # identical instances of this configuration file

                # across all your cluster nodes, without having to

                # modify this option.

#定義網絡位址

                bindnetaddr: 172.16.31.0

                # However, if you have multiple physical network

                # interfaces configured for the same subnet, then the

                # network address alone is not sufficient to identify

                # the interface Corosync should bind to. In that case,

                # configure the *host* address of the interface

                # instead:

                # bindnetaddr: 192.168.1.1

                # When selecting a multicast address, consider RFC

                # 2365 (which, among other things, specifies that

                # 239.255.x.x addresses are left to the discretion of

                # the network administrator). Do not reuse multicast

                # addresses across multiple Corosync clusters sharing

                # the same network.

#定義多點傳播位址

                mcastaddr: 239.31.131.12

                # Corosync uses the port you specify here for UDP

                # messaging, and also the immediately preceding

                # port. Thus if you set this to 5405, Corosync sends

                # messages over UDP ports 5405 and 5404.

#資訊傳遞端口

                mcastport: 5405

                # Time-to-live for cluster communication packets. The

                # number of hops (routers) that this ring will allow

                # itself to pass. Note that multicast routing must be

                # specifically enabled on most network routers.

                ttl: 1

        }

}

logging {

        # Log the source file and line where messages are being

        # generated. When in doubt, leave off. Potentially useful for

        # debugging.

        fileline: off

        # Log to standard error. When in doubt, set to no. Useful when

        # running in the foreground (when invoking "corosync -f")

        to_stderr: no

        # Log to a log file. When set to "no", the "logfile" option

        # must not be set.

#定義日志記錄存放

        to_logfile: yes

        logfile: /var/log/cluster/corosync.log

        # Log to the system log daemon. When in doubt, set to yes.

        #to_syslog: yes

        # Log debug messages (very verbose). When in doubt, leave off.

        debug: off

        # Log messages with time stamps. When in doubt, set to on

        # (unless you are only logging to syslog, where double

        # timestamps can be annoying).

        timestamp: on

        logger_subsys {

                subsys: AMF

                debug: off

#以插件方式啟動pacemaker:

service {

        ver:    0

        name:   pacemaker

3.生成認證密鑰檔案:認證密鑰檔案需要1024位元組,我們可以下載下傳程式包來實作寫滿記憶體的熵池實作,

[root@node1 corosync]# corosync-keygen 

Corosync Cluster Engine Authentication key generator.

Gathering 1024 bits for key from /dev/random.

Press keys on your keyboard to generate entropy.

Press keys on your keyboard to generate entropy (bits = 152).

Press keys on your keyboard to generate entropy (bits = 216).

Press keys on your keyboard to generate entropy (bits = 280).

Press keys on your keyboard to generate entropy (bits = 344).

Press keys on your keyboard to generate entropy (bits = 408).

Press keys on your keyboard to generate entropy (bits = 472).

Press keys on your keyboard to generate entropy (bits = 536).

Press keys on your keyboard to generate entropy (bits = 600).

Press keys on your keyboard to generate entropy (bits = 664).

Press keys on your keyboard to generate entropy (bits = 728).

Press keys on your keyboard to generate entropy (bits = 792).

Press keys on your keyboard to generate entropy (bits = 856).

Press keys on your keyboard to generate entropy (bits = 920).

Press keys on your keyboard to generate entropy (bits = 984).

Writing corosync key to /etc/corosync/authkey.

完成後将配置檔案及認證密鑰複制一份到節點2:

[root@node1 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/

authkey                                       100%  128     0.1KB/s   00:00    

corosync.conf                                 100% 2703     2.6KB/s   00:00

4.啟動corosync服務:

[root@node1 corosync]# cd

[root@node1 ~]# service corosync start

Starting Corosync Cluster Engine (corosync):               [  OK  ]

[root@node2 ~]# service corosync start

5.檢視日志:

檢視corosync引擎是否正常啟動:

節點1的啟動日志:

[root@node1 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log 

Jan 02 08:28:13 corosync [MAIN  ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.

Jan 02 08:28:13 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

Jan 02 08:32:48 corosync [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:2055.

Jan 02 08:38:42 corosync [MAIN  ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.

Jan 02 08:38:42 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

節點2的啟動日志:

[root@node2 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log 

Jan 02 08:38:56 corosync [MAIN  ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.

Jan 02 08:38:56 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

檢視關鍵字TOTEM,初始化成員節點通知是否發出:

[root@node1 ~]# grep "TOTEM" /var/log/cluster/corosync.log

Jan 02 08:28:13 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).

Jan 02 08:28:13 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Jan 02 08:28:14 corosync [TOTEM ] The network interface [172.16.31.11] is now up.

Jan 02 08:28:14 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

Jan 02 08:38:42 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).

Jan 02 08:38:42 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Jan 02 08:38:42 corosync [TOTEM ] The network interface [172.16.31.10] is now up.

Jan 02 08:38:42 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

Jan 02 08:38:51 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

使用crm_mon指令檢視節點線上數量:

[root@node1 ~]# crm_mon

Last updated: Fri Jan  2 08:42:23 2015

Last change: Fri Jan  2 08:38:52 2015

Stack: classic openais (with plugin)

Current DC: node1.stu31.com - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured, 2 expected votes

0 Resources configured

Online: [ node1.stu31.com node2.stu31.com ]

檢視監聽端口5405是否開啟:

[root@node1 ~]# ss -tunl |grep 5405

udp    UNCONN     0      0           172.16.31.10:5405                  *:*     

udp    UNCONN     0      0          239.31.131.12:5405                  *:*

檢視錯誤日志:

[root@node1 ~]# grep ERROR /var/log/cluster/corosync.log 

#警告資訊:将pacemaker以插件運作的告警,忽略即可

Jan 02 08:28:14 corosync [pcmk  ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.

Jan 02 08:28:14 corosync [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN

Jan 02 08:28:37 [29004] node1.stu31.com    pengine:   notice: process_pe_message:       Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.

Jan 02 08:32:47 [29004] node1.stu31.com    pengine:   notice: process_pe_message:       Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.

Jan 02 08:38:42 corosync [pcmk  ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.

Jan 02 08:38:42 corosync [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN

Jan 02 08:39:05 [29300] node1.stu31.com    pengine:   notice: process_pe_message:       Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.

[root@node1 ~]# crm_verify -L -V

#無stonith裝置,可以忽略

   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined

   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option

   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity

Errors found during check: config not valid

三.叢集配置工具安裝:crmsh軟體安裝

1.配置yum源:我這裡存在一個完整的yum源伺服器

[root@node1 yum.repos.d]# vim centos6.6.repo 

[base]

name=CentOS $releasever $basearch on local server 172.16.0.1

baseurl=http://172.16.0.1/cobbler/ks_mirror/CentOS-6.6-$basearch/

gpgcheck=0

[extra]

name=CentOS $releasever $basearch extras 

baseurl=http://172.16.0.1/centos/$releasever/extras/$basearch/

[epel]

name=Fedora EPEL for CentOS$releasever $basearch on local server 172.16.0.1

baseurl=http://172.16.0.1/fedora-epel/$releasever/$basearch/

[corosync2]

name=corosync2

baseurl=ftp://172.16.0.1/pub/Sources/6.x86_64/corosync/

複制一份到節點2:

[root@node1 yum.repos.d]# scp centos6.6.repo node2:/etc/yum.repos.d/

centos6.6.repo                                100%  522     0.5KB/s   00:00

2.安裝crmsh軟體,2各節點都安裝

[root@node1 ~]# yum install -y crmsh

[root@node2 ~]# yum install -y crmsh

3.去除上面的stonith裝置警告錯誤:

[root@node1 ~]# crm

crm(live)# configure

crm(live)configure# property stonith-enabled=false

crm(live)configure# verify

#單節點需要仲裁,或者忽略(會造成叢集分裂)

crm(live)configure# property no-quorum-policy=ignore

crm(live)configure# commit

crm(live)configure# show

node node1.stu31.com

node node2.stu31.com

property cib-bootstrap-options: \

        dc-version=1.1.11-97629de \

        cluster-infrastructure="classic openais (with plugin)" \

        expected-quorum-votes=2 \

        stonith-enabled=false \

        no-quorum-policy=ignore

無錯誤資訊輸出了:

[root@node1 ~]#

四.實作使用corosync+pacemaker+crmsh來建構一個高可用性的web叢集:

1.httpd服務的完整性測試

測試頁建構:

[root@node1 ~]# echo "node1.stu31.com" &gt; /var/www/html/index.html

[root@node2 ~]# echo "node2.stu31.com" &gt; /var/www/html/index.html

啟動httpd服務,完成測試:

node1節點:

[root@node1 ~]# service httpd start

Starting httpd:                                            [  OK  ]

[root@node1 ~]# curl http://172.16.31.10

node1.stu31.com

node2節點:

[root@node2 ~]# service httpd start

[root@node2 ~]# curl http://172.16.31.11

node2.stu31.com

關閉httpd服務,關閉httpd服務自啟動:

node1設定:

[root@node1 ~]# service httpd stop

Stopping httpd:                                            [  OK  ]

[root@node1 ~]# chkconfig httpd off

node2設定:

[root@node2 ~]# service httpd stop

[root@node2 ~]# chkconfig httpd off

2.定義叢集VIP位址

crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip='172.16.31.166' nic='eth0' cidr_netmask='16' broadcast='172.16.31.255'

可以檢視node1上的ip位址:

[root@node1 ~]# ip addr show

1: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 65536 qdisc noqueue state UNKNOWN 

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

    inet6 ::1/128 scope host 

       valid_lft forever preferred_lft forever

2: eth0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc pfifo_fast state UP qlen 1000

    link/ether 08:00:27:16:bc:4a brd ff:ff:ff:ff:ff:ff

    inet 172.16.31.10/16 brd 172.16.255.255 scope global eth0

    inet 172.16.31.166/16 brd 172.16.31.255 scope global secondary eth0

    inet6 fe80::a00:27ff:fe16:bc4a/64 scope link 

切換節點node1為備用節點:

crm(live)configure# cd

crm(live)# node

#将節點1設定為備用節點

crm(live)node# standby

#将備用節點啟動

crm(live)node# online

crm(live)node# cd

#檢視各節點狀态資訊

crm(live)# status

Last updated: Fri Jan  2 11:11:47 2015

Last change: Fri Jan  2 11:11:38 2015

1 Resources configured

#可以看出主備節點都啟動了,但是資源是啟動在node2上的

 webip  (ocf::heartbeat:IPaddr):        Started node2.stu31.com

我們需要定義資源監控,需要編輯原來定義的webip資源:

crm(live)# resource

#檢視資源webip的狀态資訊

crm(live)resource# status webip

resource webip is running on: node2.stu31.com 

#停止webip資源

crm(live)resource# stop webip

crm(live)resource# cd

#删除資源webip

crm(live)configure# delete webip

#重新定義webip資源,定義資源監控

crm(live)configure# primitive webip IPaddr params ip=172.16.31.166 op monitor interval=10s timeout=20s

#配置校驗

#送出資源

3.定義httpd服務資源及定義資源的限制配置:

#定義httpd服務資源

crm(live)configure# primitive webserver lsb:httpd op monitor interval=30s timeout=15s

#定義協同限制,httpd服務資源跟随VIP在節點啟動

crm(live)configure# colocation webserver_with_webip inf: webserver webip

#定義順序限制,先啟動webip資源,再啟動webserver資源

crm(live)configure# order webip_before_webserver mandatory: webip webserver

#定義位置限制,資源對節點的傾向性,更傾向于node1節點。

crm(live)configure# location webip_prefer_node1 webip rule 100: uname eq node1.stu31.com 

#完成設定後就送出

#檢視叢集資源啟動狀态資訊

Last updated: Fri Jan  2 11:27:16 2015

Last change: Fri Jan  2 11:27:07 2015

2 Resources configured

 webip  (ocf::heartbeat:IPaddr):        Started node1.stu31.com 

 webserver      (lsb:httpd):    Started node1.stu31.com

資源已經啟動了,并且啟動在node1節點上,我們來測試是否成功!

檢視node1節點的VIP資訊:

    inet 172.16.31.166/16 brd 172.16.255.255 scope global secondary eth0

檢視web伺服器的監聽端口是否啟動:

[root@node1 ~]# ss -tunl |grep 80

tcp    LISTEN     0      128                   :::80                   :::*

到其他主機通路測試:

[root@nfs ~]# curl http://172.16.31.166

我們将node1切換成備用節點:

crm(live)# node standby

Last updated: Fri Jan  2 11:30:13 2015

Last change: Fri Jan  2 11:30:11 2015

Node node1.stu31.com: standby

Online: [ node2.stu31.com ]

 webip  (ocf::heartbeat:IPaddr):        Started node2.stu31.com 

 webserver      (lsb:httpd):    Started node2.stu31.com

crm(live)#

通路測試:

測試成功!

4.下面我們來測試定義資源對目前節點的粘性:

crm(live)configure# property default-resource-stickiness=100

crm(live)# node online

Last updated: Fri Jan  2 11:33:07 2015

Last change: Fri Jan  2 11:33:05 2015

#上面我們定義位置限制時定義了資源的傾向性是node1,預想情況是我們這邊node1上線後會自動搶占node2成為主節點,但是我們定義了資源對節點的粘性,是以我們的node1上線後未搶占node2,說明資源對節點的粘性是比資源對節點的傾向性更強的限制。

五.定義檔案系統資源:

1.前提是存在一個共享的檔案系統

配置NFS伺服器

[root@nfs ~]# mkdir /www/htdocs -pv

[root@nfs ~]# vim /etc/exports 

/www/htdocs   172.16.31.0/16(rw,no_root_squash)

[root@nfs ~]# service nfs start

[root@nfs ~]# showmount -e 172.16.31.12                               

Export list for 172.16.31.12:

/www/htdocs 172.16.31.0/16

建立一個測試網頁:

[root@nfs ~]# echo "page from nfs filesystem" &gt; /www/htdocs/index.html

2.用戶端挂載nfs檔案系統:

[root@node1 ~]# mount -t nfs 172.16.31.12:/www/htdocs /var/www/html/

[root@node1 ~]# ls /var/www/html/

index.html

page from nfs filesystem

成功後解除安裝檔案系統:

[root@node1 ~]# umount /var/www/html/

3.我們開始定義filesystem資源:

#定義檔案系統存儲資源

crm(live)configure# primitive webstore ocf:heartbeat:Filesystem params device="172.16.31.12:/www/htdocs" directory="/var/www/html" fstype="nfs" op monitor interva=20s timeout=40s

#校驗警告資訊,提示我們的start和stop逾時時間為設定

WARNING: webstore: default timeout 20s for start is smaller than the advised 60

WARNING: webstore: default timeout 20s for stop is smaller than the advised 60

#删除資源,重新設定

crm(live)configure# delete webstore

#加入start和stop的逾時時長

crm(live)configure# primitive webstore ocf:heartbeat:Filesystem params device="172.16.31.12:/www/htdocs" directory="/var/www/html" fstype="nfs" op monitor interva=20s timeout=40s op start timeout=60s op stop timeout=60s

#定義資源組,來定義web這個服務需要的所有資源進一個組内,便于管理

crm(live)configure# group webservice webip webstore webserver

INFO: resource references in location:webip_prefer_node1 updated

INFO: resource references in colocation:webserver_with_webip updated

INFO: resource references in order:webip_before_webserver updated

#定義完成後就送出,然後檢視資源狀态資訊

Last updated: Fri Jan  2 11:52:51 2015

Last change: Fri Jan  2 11:52:44 2015

3 Resources configured

Node node2.stu31.com: standby

Online: [ node1.stu31.com ]

 Resource Group: webservice

     webip      (ocf::heartbeat:IPaddr):        Started node1.stu31.com 

     webstore   (ocf::heartbeat:Filesystem):    Started node1.stu31.com 

     webserver  (lsb:httpd):    Started node1.stu31.com

#最後定義一下資源的啟動順序,先啟動存儲,在啟動httpd服務:

crm(live)configure# order webstore_before_webserver mandatory: webstore webserver

Last updated: Fri Jan  2 11:55:00 2015

Last change: Fri Jan  2 11:54:10 2015

crm(live)# quit

bye

通路測試成功!

自此,一個由corosync+pacemaker+crmsh建構的web高可用性叢集就建構成功!

本文轉自 dengaosky 51CTO部落格,原文連結:http://blog.51cto.com/dengaosky/1964586,如需轉載請自行聯系原作者

繼續閱讀