網絡規劃:
node1:eth0:172.16.31.10/16
node2: eth0: 172.16.31.11/16
nfs: eth0: 172.16.31.12/15
注:
nfs在提供NFS服務的同時是一台NTP伺服器,可以讓node1和node2同步時間的。
node1和node2之間心跳資訊傳遞依靠eth0傳遞
web伺服器的VIP是172.16.31.166/16
架構圖:跟前文的架構一樣,隻是節點上安裝的高可用軟體不一緻:
<a href="http://s3.51cto.com/wyfs02/M02/58/05/wKiom1Snt9HSxaRCAAIlf-VMcmY056.jpg" target="_blank"></a>
一.高可用叢集建構的前提條件
1.主機名互相解析,實作主機名通信
[root@node1 ~]# vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.31.10 node1.stu31.com node1
172.16.31.11 node2.stu31.com node2
複制一份到node2:
[root@node1 ~]# scp /etc/hosts [email protected]:/etc/hosts
2.節點直接實作ssh無密鑰通信
節點1:
[root@node1 ~]# ssh-keygen -t rsa -P ""
[root@node1 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@node2
節點2:
[root@node2 ~]# ssh-keygen -t rsa -P ""
[root@node2 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@node1
測試:
[root@node2 ~]# date ; ssh node1 'date'
Fri Jan 2 05:46:54 CST 2015
時間同步成功!注意時間必須一緻!
ntp伺服器建構參考:http://sohudrgon.blog.51cto.com/3088108/1598314
二.叢集軟體安裝及配置
1.安裝corosync和pacemaker軟體包:節點1和節點2都安裝
# yum install corosync pacemaker -y
2.建立配置檔案并配置
[root@node1 ~]# cd /etc/corosync/
[root@node1 corosync]# cp corosync.conf.example corosync.conf
[root@node1 corosync]# cat corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2
# secauth: Enable mutual node authentication. If you choose to
# enable this ("on"), then do remember to create a shared
# secret with "corosync-keygen".
#開啟認證
secauth: on
threads: 0
# interface: define at least one interface to communicate
# over. If you define more than one interface stanza, you must
# also set rrp_mode.
interface {
# Rings must be consecutively numbered, starting at 0.
ringnumber: 0
# This is normally the *network* address of the
# interface to bind to. This ensures that you can use
# identical instances of this configuration file
# across all your cluster nodes, without having to
# modify this option.
#定義網絡位址
bindnetaddr: 172.16.31.0
# However, if you have multiple physical network
# interfaces configured for the same subnet, then the
# network address alone is not sufficient to identify
# the interface Corosync should bind to. In that case,
# configure the *host* address of the interface
# instead:
# bindnetaddr: 192.168.1.1
# When selecting a multicast address, consider RFC
# 2365 (which, among other things, specifies that
# 239.255.x.x addresses are left to the discretion of
# the network administrator). Do not reuse multicast
# addresses across multiple Corosync clusters sharing
# the same network.
#定義多點傳播位址
mcastaddr: 239.31.131.12
# Corosync uses the port you specify here for UDP
# messaging, and also the immediately preceding
# port. Thus if you set this to 5405, Corosync sends
# messages over UDP ports 5405 and 5404.
#資訊傳遞端口
mcastport: 5405
# Time-to-live for cluster communication packets. The
# number of hops (routers) that this ring will allow
# itself to pass. Note that multicast routing must be
# specifically enabled on most network routers.
ttl: 1
}
}
logging {
# Log the source file and line where messages are being
# generated. When in doubt, leave off. Potentially useful for
# debugging.
fileline: off
# Log to standard error. When in doubt, set to no. Useful when
# running in the foreground (when invoking "corosync -f")
to_stderr: no
# Log to a log file. When set to "no", the "logfile" option
# must not be set.
#定義日志記錄存放
to_logfile: yes
logfile: /var/log/cluster/corosync.log
# Log to the system log daemon. When in doubt, set to yes.
#to_syslog: yes
# Log debug messages (very verbose). When in doubt, leave off.
debug: off
# Log messages with time stamps. When in doubt, set to on
# (unless you are only logging to syslog, where double
# timestamps can be annoying).
timestamp: on
logger_subsys {
subsys: AMF
debug: off
#以插件方式啟動pacemaker:
service {
ver: 0
name: pacemaker
3.生成認證密鑰檔案:認證密鑰檔案需要1024位元組,我們可以下載下傳程式包來實作寫滿記憶體的熵池實作,
[root@node1 corosync]# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Press keys on your keyboard to generate entropy (bits = 152).
Press keys on your keyboard to generate entropy (bits = 216).
Press keys on your keyboard to generate entropy (bits = 280).
Press keys on your keyboard to generate entropy (bits = 344).
Press keys on your keyboard to generate entropy (bits = 408).
Press keys on your keyboard to generate entropy (bits = 472).
Press keys on your keyboard to generate entropy (bits = 536).
Press keys on your keyboard to generate entropy (bits = 600).
Press keys on your keyboard to generate entropy (bits = 664).
Press keys on your keyboard to generate entropy (bits = 728).
Press keys on your keyboard to generate entropy (bits = 792).
Press keys on your keyboard to generate entropy (bits = 856).
Press keys on your keyboard to generate entropy (bits = 920).
Press keys on your keyboard to generate entropy (bits = 984).
Writing corosync key to /etc/corosync/authkey.
完成後将配置檔案及認證密鑰複制一份到節點2:
[root@node1 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/
authkey 100% 128 0.1KB/s 00:00
corosync.conf 100% 2703 2.6KB/s 00:00
4.啟動corosync服務:
[root@node1 corosync]# cd
[root@node1 ~]# service corosync start
Starting Corosync Cluster Engine (corosync): [ OK ]
[root@node2 ~]# service corosync start
5.檢視日志:
檢視corosync引擎是否正常啟動:
節點1的啟動日志:
[root@node1 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Jan 02 08:28:13 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Jan 02 08:28:13 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Jan 02 08:32:48 corosync [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
Jan 02 08:38:42 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Jan 02 08:38:42 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
節點2的啟動日志:
[root@node2 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Jan 02 08:38:56 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Jan 02 08:38:56 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
檢視關鍵字TOTEM,初始化成員節點通知是否發出:
[root@node1 ~]# grep "TOTEM" /var/log/cluster/corosync.log
Jan 02 08:28:13 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Jan 02 08:28:13 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jan 02 08:28:14 corosync [TOTEM ] The network interface [172.16.31.11] is now up.
Jan 02 08:28:14 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan 02 08:38:42 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Jan 02 08:38:42 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jan 02 08:38:42 corosync [TOTEM ] The network interface [172.16.31.10] is now up.
Jan 02 08:38:42 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan 02 08:38:51 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
使用crm_mon指令檢視節點線上數量:
[root@node1 ~]# crm_mon
Last updated: Fri Jan 2 08:42:23 2015
Last change: Fri Jan 2 08:38:52 2015
Stack: classic openais (with plugin)
Current DC: node1.stu31.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ node1.stu31.com node2.stu31.com ]
檢視監聽端口5405是否開啟:
[root@node1 ~]# ss -tunl |grep 5405
udp UNCONN 0 0 172.16.31.10:5405 *:*
udp UNCONN 0 0 239.31.131.12:5405 *:*
檢視錯誤日志:
[root@node1 ~]# grep ERROR /var/log/cluster/corosync.log
#警告資訊:将pacemaker以插件運作的告警,忽略即可
Jan 02 08:28:14 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
Jan 02 08:28:14 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN
Jan 02 08:28:37 [29004] node1.stu31.com pengine: notice: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues.
Jan 02 08:32:47 [29004] node1.stu31.com pengine: notice: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues.
Jan 02 08:38:42 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
Jan 02 08:38:42 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN
Jan 02 08:39:05 [29300] node1.stu31.com pengine: notice: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues.
[root@node1 ~]# crm_verify -L -V
#無stonith裝置,可以忽略
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
三.叢集配置工具安裝:crmsh軟體安裝
1.配置yum源:我這裡存在一個完整的yum源伺服器
[root@node1 yum.repos.d]# vim centos6.6.repo
[base]
name=CentOS $releasever $basearch on local server 172.16.0.1
baseurl=http://172.16.0.1/cobbler/ks_mirror/CentOS-6.6-$basearch/
gpgcheck=0
[extra]
name=CentOS $releasever $basearch extras
baseurl=http://172.16.0.1/centos/$releasever/extras/$basearch/
[epel]
name=Fedora EPEL for CentOS$releasever $basearch on local server 172.16.0.1
baseurl=http://172.16.0.1/fedora-epel/$releasever/$basearch/
[corosync2]
name=corosync2
baseurl=ftp://172.16.0.1/pub/Sources/6.x86_64/corosync/
複制一份到節點2:
[root@node1 yum.repos.d]# scp centos6.6.repo node2:/etc/yum.repos.d/
centos6.6.repo 100% 522 0.5KB/s 00:00
2.安裝crmsh軟體,2各節點都安裝
[root@node1 ~]# yum install -y crmsh
[root@node2 ~]# yum install -y crmsh
3.去除上面的stonith裝置警告錯誤:
[root@node1 ~]# crm
crm(live)# configure
crm(live)configure# property stonith-enabled=false
crm(live)configure# verify
#單節點需要仲裁,或者忽略(會造成叢集分裂)
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# commit
crm(live)configure# show
node node1.stu31.com
node node2.stu31.com
property cib-bootstrap-options: \
dc-version=1.1.11-97629de \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=false \
no-quorum-policy=ignore
無錯誤資訊輸出了:
[root@node1 ~]#
四.實作使用corosync+pacemaker+crmsh來建構一個高可用性的web叢集:
1.httpd服務的完整性測試
測試頁建構:
[root@node1 ~]# echo "node1.stu31.com" > /var/www/html/index.html
[root@node2 ~]# echo "node2.stu31.com" > /var/www/html/index.html
啟動httpd服務,完成測試:
node1節點:
[root@node1 ~]# service httpd start
Starting httpd: [ OK ]
[root@node1 ~]# curl http://172.16.31.10
node1.stu31.com
node2節點:
[root@node2 ~]# service httpd start
[root@node2 ~]# curl http://172.16.31.11
node2.stu31.com
關閉httpd服務,關閉httpd服務自啟動:
node1設定:
[root@node1 ~]# service httpd stop
Stopping httpd: [ OK ]
[root@node1 ~]# chkconfig httpd off
node2設定:
[root@node2 ~]# service httpd stop
[root@node2 ~]# chkconfig httpd off
2.定義叢集VIP位址
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip='172.16.31.166' nic='eth0' cidr_netmask='16' broadcast='172.16.31.255'
可以檢視node1上的ip位址:
[root@node1 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:16:bc:4a brd ff:ff:ff:ff:ff:ff
inet 172.16.31.10/16 brd 172.16.255.255 scope global eth0
inet 172.16.31.166/16 brd 172.16.31.255 scope global secondary eth0
inet6 fe80::a00:27ff:fe16:bc4a/64 scope link
切換節點node1為備用節點:
crm(live)configure# cd
crm(live)# node
#将節點1設定為備用節點
crm(live)node# standby
#将備用節點啟動
crm(live)node# online
crm(live)node# cd
#檢視各節點狀态資訊
crm(live)# status
Last updated: Fri Jan 2 11:11:47 2015
Last change: Fri Jan 2 11:11:38 2015
1 Resources configured
#可以看出主備節點都啟動了,但是資源是啟動在node2上的
webip (ocf::heartbeat:IPaddr): Started node2.stu31.com
我們需要定義資源監控,需要編輯原來定義的webip資源:
crm(live)# resource
#檢視資源webip的狀态資訊
crm(live)resource# status webip
resource webip is running on: node2.stu31.com
#停止webip資源
crm(live)resource# stop webip
crm(live)resource# cd
#删除資源webip
crm(live)configure# delete webip
#重新定義webip資源,定義資源監控
crm(live)configure# primitive webip IPaddr params ip=172.16.31.166 op monitor interval=10s timeout=20s
#配置校驗
#送出資源
3.定義httpd服務資源及定義資源的限制配置:
#定義httpd服務資源
crm(live)configure# primitive webserver lsb:httpd op monitor interval=30s timeout=15s
#定義協同限制,httpd服務資源跟随VIP在節點啟動
crm(live)configure# colocation webserver_with_webip inf: webserver webip
#定義順序限制,先啟動webip資源,再啟動webserver資源
crm(live)configure# order webip_before_webserver mandatory: webip webserver
#定義位置限制,資源對節點的傾向性,更傾向于node1節點。
crm(live)configure# location webip_prefer_node1 webip rule 100: uname eq node1.stu31.com
#完成設定後就送出
#檢視叢集資源啟動狀态資訊
Last updated: Fri Jan 2 11:27:16 2015
Last change: Fri Jan 2 11:27:07 2015
2 Resources configured
webip (ocf::heartbeat:IPaddr): Started node1.stu31.com
webserver (lsb:httpd): Started node1.stu31.com
資源已經啟動了,并且啟動在node1節點上,我們來測試是否成功!
檢視node1節點的VIP資訊:
inet 172.16.31.166/16 brd 172.16.255.255 scope global secondary eth0
檢視web伺服器的監聽端口是否啟動:
[root@node1 ~]# ss -tunl |grep 80
tcp LISTEN 0 128 :::80 :::*
到其他主機通路測試:
[root@nfs ~]# curl http://172.16.31.166
我們将node1切換成備用節點:
crm(live)# node standby
Last updated: Fri Jan 2 11:30:13 2015
Last change: Fri Jan 2 11:30:11 2015
Node node1.stu31.com: standby
Online: [ node2.stu31.com ]
webip (ocf::heartbeat:IPaddr): Started node2.stu31.com
webserver (lsb:httpd): Started node2.stu31.com
crm(live)#
通路測試:
測試成功!
4.下面我們來測試定義資源對目前節點的粘性:
crm(live)configure# property default-resource-stickiness=100
crm(live)# node online
Last updated: Fri Jan 2 11:33:07 2015
Last change: Fri Jan 2 11:33:05 2015
#上面我們定義位置限制時定義了資源的傾向性是node1,預想情況是我們這邊node1上線後會自動搶占node2成為主節點,但是我們定義了資源對節點的粘性,是以我們的node1上線後未搶占node2,說明資源對節點的粘性是比資源對節點的傾向性更強的限制。
五.定義檔案系統資源:
1.前提是存在一個共享的檔案系統
配置NFS伺服器
[root@nfs ~]# mkdir /www/htdocs -pv
[root@nfs ~]# vim /etc/exports
/www/htdocs 172.16.31.0/16(rw,no_root_squash)
[root@nfs ~]# service nfs start
[root@nfs ~]# showmount -e 172.16.31.12
Export list for 172.16.31.12:
/www/htdocs 172.16.31.0/16
建立一個測試網頁:
[root@nfs ~]# echo "page from nfs filesystem" > /www/htdocs/index.html
2.用戶端挂載nfs檔案系統:
[root@node1 ~]# mount -t nfs 172.16.31.12:/www/htdocs /var/www/html/
[root@node1 ~]# ls /var/www/html/
index.html
page from nfs filesystem
成功後解除安裝檔案系統:
[root@node1 ~]# umount /var/www/html/
3.我們開始定義filesystem資源:
#定義檔案系統存儲資源
crm(live)configure# primitive webstore ocf:heartbeat:Filesystem params device="172.16.31.12:/www/htdocs" directory="/var/www/html" fstype="nfs" op monitor interva=20s timeout=40s
#校驗警告資訊,提示我們的start和stop逾時時間為設定
WARNING: webstore: default timeout 20s for start is smaller than the advised 60
WARNING: webstore: default timeout 20s for stop is smaller than the advised 60
#删除資源,重新設定
crm(live)configure# delete webstore
#加入start和stop的逾時時長
crm(live)configure# primitive webstore ocf:heartbeat:Filesystem params device="172.16.31.12:/www/htdocs" directory="/var/www/html" fstype="nfs" op monitor interva=20s timeout=40s op start timeout=60s op stop timeout=60s
#定義資源組,來定義web這個服務需要的所有資源進一個組内,便于管理
crm(live)configure# group webservice webip webstore webserver
INFO: resource references in location:webip_prefer_node1 updated
INFO: resource references in colocation:webserver_with_webip updated
INFO: resource references in order:webip_before_webserver updated
#定義完成後就送出,然後檢視資源狀态資訊
Last updated: Fri Jan 2 11:52:51 2015
Last change: Fri Jan 2 11:52:44 2015
3 Resources configured
Node node2.stu31.com: standby
Online: [ node1.stu31.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node1.stu31.com
webstore (ocf::heartbeat:Filesystem): Started node1.stu31.com
webserver (lsb:httpd): Started node1.stu31.com
#最後定義一下資源的啟動順序,先啟動存儲,在啟動httpd服務:
crm(live)configure# order webstore_before_webserver mandatory: webstore webserver
Last updated: Fri Jan 2 11:55:00 2015
Last change: Fri Jan 2 11:54:10 2015
crm(live)# quit
bye
通路測試成功!
自此,一個由corosync+pacemaker+crmsh建構的web高可用性叢集就建構成功!
本文轉自 dengaosky 51CTO部落格,原文連結:http://blog.51cto.com/dengaosky/1964586,如需轉載請自行聯系原作者