天天看点

mysql空间满了MHA会切换吗_MySQL高可用方案MHA在线切换的步骤及原理

在日常工作中,会碰到如下的场景,如MySQL数据库升级,主服务器硬件升级等,这个时候就需要将写操作切换到另外一台服务器上,那么如何进行在线切换呢?同时,要求切换过程短,对业务的影响比较小。

MHA就提供了这样一种优雅的方式,只会堵塞业务0.5~2s的时间,在这段时间内,业务无法读取和写入。

集群信息

角色                             IP地址                 ServerID      类型

Master                         192.168.244.10   1                 写入

Candicate master          192.168.244.20   2                 读

Slave                           192.168.244.30   3                 读

Monitor host                 192.168.244.40                     监控集群组

MHA具体的搭建步骤和原理,可参考另外一篇文章:

在线切换的步骤

1. 关闭MHA监控

# masterha_stop --conf=/etc/masterha/app1.cnf

2. 在线切换

# /usr/local/bin/masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.244.20 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000

其中,

--orig_master_is_new_slave是将原master切换为新主的slave,默认情况下,是不添加的。

--running_updates_limit默认为1s,即如果主从延迟时间(Seconds_Behind_Master),或master show processlist中dml操作大于1s,则不会执行切换。

在线切换的输出

Tue Apr 11 15:28:32 2017 - [info] MHA::MasterRotate version 0.56.

Tue Apr11 15:28:32 2017 - [info] Starting online master switch..

Tue Apr11 15:28:32 2017 - [info]

Tue Apr11 15:28:32 2017 - [info] * Phase 1: Configuration Check Phase..

Tue Apr11 15:28:32 2017 - [info]

Tue Apr11 15:28:32 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Tue Apr11 15:28:32 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..

Tue Apr11 15:28:32 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..

Tue Apr11 15:28:34 2017 - [info] GTID failover mode = 0Tue Apr11 15:28:34 2017 - [info] Current Alive Master: 192.168.244.10(192.168.244.10:3306)

Tue Apr11 15:28:34 2017 - [info] Alive Slaves:

Tue Apr11 15:28:34 2017 - [info]  192.168.244.20(192.168.244.20:3306)  Version=5.6.31-log (oldest major version between slaves) log-bin:enabledTue Apr 11 15:28:34 2017 - [info]    Replicating from 192.168.244.10(192.168.244.10:3306)

Tue Apr11 15:28:34 2017 - [info]    Primary candidate forthe new Master (candidate_master is set)

Tue Apr11 15:28:34 2017 - [info]  192.168.244.30(192.168.244.30:3306)  Version=5.6.31-log (oldest major version between slaves) log-bin:enabledTue Apr 11 15:28:34 2017 - [info]    Replicating from 192.168.244.10(192.168.244.10:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on192.168.244.10(192.168.244.10:3306)? (YES/no): yes

Tue Apr 11 15:28:47 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..

Tue Apr11 15:28:47 2017 - [info]  ok.

Tue Apr11 15:28:47 2017 - [info] Checking MHA is not monitoring or doing failover..

Tue Apr11 15:28:47 2017 - [info] Checking replication health on 192.168.244.20..

Tue Apr11 15:28:47 2017 - [info]  ok.

Tue Apr11 15:28:47 2017 - [info] Checking replication health on 192.168.244.30..

Tue Apr11 15:28:47 2017 - [info]  ok.

Tue Apr11 15:28:47 2017 - [info] 192.168.244.20can be new master.

Tue Apr11 15:28:47 2017 - [info]

From:192.168.244.10(192.168.244.10:3306) (current master)+--192.168.244.20(192.168.244.20:3306)+--192.168.244.30(192.168.244.30:3306)

To:192.168.244.20(192.168.244.20:3306) (new master)+--192.168.244.30(192.168.244.30:3306)+--192.168.244.10(192.168.244.10:3306)

Starting master switch from192.168.244.10(192.168.244.10:3306) to 192.168.244.20(192.168.244.20:3306)? (yes/NO): yes

Tue Apr11 15:29:00 2017 - [info] Checking whether 192.168.244.20(192.168.244.20:3306) is ok forthe new master..

Tue Apr11 15:29:00 2017 - [info]  ok.

Tue Apr11 15:29:00 2017 - [info] 192.168.244.10(192.168.244.10:3306): SHOW SLAVE STATUS returned empty result. To check replication

filtering rules, temporarily executing CHANGE MASTER to a dummy host.Tue Apr11 15:29:00 2017 - [info] 192.168.244.10(192.168.244.10:3306): Resetting slave pointing to the dummy host.

Tue Apr11 15:29:00 2017 - [info] ** Phase 1: Configuration Check Phase completed.

Tue Apr11 15:29:00 2017 - [info]

Tue Apr11 15:29:00 2017 - [info] * Phase 2: Rejecting updates Phase..

Tue Apr11 15:29:00 2017 - [info]

Tue Apr11 15:29:00 2017 - [info] Executing master ip online change script to disable writeon the current master:

Tue Apr11 15:29:00 2017 - [info]  /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.244.10 --orig_ma

ster_ip=192.168.244.10 --orig_master_port=3306 --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root  --orig_master_is_new_slaveTue Apr 11 15:29:00 2017 476501Set read_only on the new master.. ok.

Tue Apr11 15:29:00 2017 911951 Set read_only=1on the orig master.. ok.

Tue Apr11 15:29:00 2017 919517Killing all application threads..

Tue Apr11 15:29:00 2017 919552 done.

Disabling the VIP an old master:192.168.244.10SIOCSIFFLAGS: Cannot assign requested address

Tue Apr11 15:29:00 2017 - [info]  ok.

Tue Apr11 15:29:00 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):

Tue Apr11 15:29:00 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..

Tue Apr11 15:29:00 2017 - [info]  ok.

Tue Apr11 15:29:00 2017 - [info] Orig master binlog:pos is mysql-bin.000016:211.

Tue Apr11 15:29:00 2017 - [info]  Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..

Tue Apr11 15:29:01 2017 - [info]  master_pos_wait(mysql-bin.000016:211) completed on 192.168.244.20(192.168.244.20:3306). Executed 0events.Tue Apr11 15:29:01 2017 - [info]  done.

Tue Apr11 15:29:01 2017 - [info] Getting new master's binlog name and position..

Tue Apr 11 15:29:01 2017 - [info]  mysql-bin.000009:211Tue Apr11 15:29:01 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_

HOST='192.168.244.20', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000009', MASTER_LOG_POS=211, MASTER_USER='repl', MASTER_PASSWORD='xxx';Tue Apr 11 15:29:01 2017 - [info] Executing master ip online change script to allow writeon the new master:

Tue Apr11 15:29:01 2017 - [info]  /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.244.10 --orig_m

aster_ip=192.168.244.10 --orig_master_port=3306 --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root  --orig_master_is_new_slaveTue Apr 11 15:29:01 2017 109040 Set read_only=0on the new master.

Enabling the VIP192.168.244.188 on the new master: 192.168.244.20Tue Apr11 15:29:01 2017 - [info]  ok.

Tue Apr11 15:29:01 2017 - [info]

Tue Apr11 15:29:01 2017 - [info] * Switching slaves inparallel..

Tue Apr11 15:29:01 2017 - [info]

Tue Apr11 15:29:01 2017 - [info] -- Slave switch on host 192.168.244.30(192.168.244.30:3306) started, pid: 17651Tue Apr11 15:29:01 2017 - [info]

Tue Apr11 15:29:02 2017 - [info] Log messages from 192.168.244.30...

Tue Apr11 15:29:02 2017 - [info]

Tue Apr11 15:29:01 2017 - [info]  Waiting to execute all relay logs on 192.168.244.30(192.168.244.30:3306)..

Tue Apr11 15:29:01 2017 - [info]  master_pos_wait(mysql-bin.000016:211) completed on 192.168.244.30(192.168.244.30:3306). Executed 0events.Tue Apr11 15:29:01 2017 - [info]  done.

Tue Apr11 15:29:01 2017 - [info]  Resetting slave 192.168.244.30(192.168.244.30:3306) and starting replication from the new master 1

92.168.244.20(192.168.244.20:3306)..Tue Apr 11 15:29:01 2017 - [info]  Executed CHANGE MASTER.

Tue Apr11 15:29:01 2017 - [info]  Slave started.

Tue Apr11 15:29:02 2017 - [info] End of log messages from 192.168.244.30...

Tue Apr11 15:29:02 2017 - [info]

Tue Apr11 15:29:02 2017 - [info] -- Slave switch on host 192.168.244.30(192.168.244.30:3306) succeeded.

Tue Apr11 15:29:02 2017 - [info] Unlocking all tables on the orig master:

Tue Apr11 15:29:02 2017 - [info] Executing UNLOCK TABLES..

Tue Apr11 15:29:02 2017 - [info]  ok.

Tue Apr11 15:29:02 2017 - [info] Starting orig master as a new slave..

Tue Apr11 15:29:02 2017 - [info]  Resetting slave 192.168.244.10(192.168.244.10:3306) and starting replication from the new master 1

92.168.244.20(192.168.244.20:3306)..Tue Apr 11 15:29:02 2017 - [info]  Executed CHANGE MASTER.

Tue Apr11 15:29:02 2017 - [info]  Slave started.

Tue Apr11 15:29:02 2017 - [info] All new slave servers switched successfully.

Tue Apr11 15:29:02 2017 - [info]

Tue Apr11 15:29:02 2017 - [info] * Phase 5: New master cleanup phase..

Tue Apr11 15:29:02 2017 - [info]

Tue Apr11 15:29:02 2017 - [info]  192.168.244.20: Resetting slave infosucceeded.

Tue Apr11 15:29:02 2017 - [info] Switching master to 192.168.244.20(192.168.244.20:3306) completed successfully.

MHA在线切换的原理

1. 检查当前的配置信息及主从服务器的信息

包括读取MHA的配置文件/etc/masterha/app1.cnf及检查当前slave的健康状态

2. 阻止对当前master的更新

主要通过如下步骤:

1> 等待1.5s($time_until_kill_threads*100ms),等待当前连接断开。

2> 执行 read_only=1,阻止新的DML操作

3> 等待0.5s,等待当前DML操作完成。

4> kill掉所有连接。

5> FLUSH NO_WRITE_TO_BINLOG TABLES

6> FLUSH TABLES WITH READ LOCK

3. 等待新master执行完所有的relay log

Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..

4. 将新master的read_only设置为off,并添加VIP

5. slave切换到新master上。

1> 等待slave(192.168.244.30)应用完原主从复制产生的relay log,然后执行change master操作切换到新master上。

2> 释放原master上加的锁。

3> 因masterha_master_switch命令行中带有--orig_master_is_new_slave参数,故原master也切���为新master的从。

6. 清理新master的相关信息。

主要是执行了reset slave all操作,清除之前的复制信息。

MHA在线切换需满足的条件

MHA在执行在线切换之前,会判断当前的主从复制信息,只有满足了以下条件,才能执行切换动作:

1. 所有SLAVE的IO线程和SQL线程都在运行。

2. 所有slave的Seconds_Behind_Master小于或等于running_updates_limit的值,该参数如果没有显示指定的话,则默认为1s

3. 在master上,通过show processlist输出,没有一个DML操作的时间大于running_updates_limit的值。

在线切换时,打开general log,各个服务器的操作信息

注:在执行masterha_master_switch命令时,会有两次确认操作

1. It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.244.10(192.168

.244.10:3306)? (YES/no):

2. Starting master switch from 192.168.244.10(192.168.244.10:3306) to 192.168.244.20(192.168.244.20:3306)? (yes/NO):

以下输出中间都有两次空白,其中第一次空白之前的输出对应第一次确认之前,第二次之前的输出对应第二次确认之前。

原master 192.168.244.10

170412 16:52:38    23Connect    [email protected] on23 Query    set autocommit=1

23Query    SELECT CONNECTION_ID() AS Value170412 16:52:39    24Connect    [email protected] on24 Query    set autocommit=1

24Query    SELECT CONNECTION_ID() AS Value24 Query    SET wait_timeout=86400

24Query    SELECT @@global.server_id As Value24Query    SELECT VERSION() AS Value24Query    SELECT @@global.gtid_mode As Value24 Query    SHOW GLOBAL VARIABLES LIKE 'log_bin'

24Query    SHOW MASTER STATUS24Query    SELECT @@global.datadir AS Value24Query    SELECT @@global.slave_parallel_workers AS Value24Query    SHOW SLAVE STATUS24Query    SELECT @@global.read_only As Value24Query    SELECT @@global.relay_log_purge As Value170412 16:54:06    24Query    FLUSH NO_WRITE_TO_BINLOG TABLES24 Query    SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '0') AS Value24Query    SHOW PROCESSLIST170412 16:55:51    24Query    SHOW SLAVE STATUS24 Query    CHANGE MASTER TO MASTER_HOST='dummy_host'

170412 16:55:52    24Query    SHOW SLAVE STATUS24 Query    RESET SLAVE

24 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Monitor') As Value24Quit25Connect    [email protected] on25 Query    set autocommit=1

25Query    SELECT CONNECTION_ID() AS Value25 Query    SET sql_log_bin=0

25Query    SHOW PROCESSLIST25Query    SELECT @@global.read_only As Value25 Query    SET GLOBAL read_only=1

25Query    SELECT @@global.read_only As Value25Query    SHOW PROCESSLIST25 Query    SET sql_log_bin=1

25Quit26Connect    [email protected] on26 Query    set autocommit=1

26Query    SELECT CONNECTION_ID() AS Value26 Query    SET wait_timeout=86400

26Query    FLUSH TABLES WITH READ LOCK26Query    SHOW MASTER STATUS170412 16:55:53    26Query    UNLOCK TABLES26 Query    CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD = MASTE

R_PORT= 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120          26 Query    SET GLOBAL relay_log_purge=0

26Query    START SLAVE27 Connect Out    [email protected]:3306

26Query    SHOW SLAVE STATUS26 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value26 Quit

新master 192.168.244.20

170412 16:52:38 23Connect [email protected] on23 Query set autocommit=1

23Query SELECT CONNECTION_ID() AS Value170412 16:52:39 24Connect [email protected] on24 Query set autocommit=1

24Query SELECT CONNECTION_ID() AS Value24 Query SET wait_timeout=86400

24Query SELECT @@global.server_id As Value24Query SELECT VERSION() AS Value24Query SELECT @@global.gtid_mode As Value24 Query SHOW GLOBAL VARIABLES LIKE 'log_bin'

24Query SHOW MASTER STATUS24Query SELECT @@global.datadir AS Value24Query SELECT @@global.slave_parallel_workers AS Value24Query SHOW SLAVE STATUS24Query SELECT @@global.read_only As Value24Query SELECT @@global.relay_log_purge As Value24Query SELECT @@global.relay_log_info_repository AS Value24Query SELECT @@global.datadir AS Value24Query SELECT @@global.relay_log_info_file AS Value24Query SHOW SLAVE STATUS24 Query SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl'

170412 16:54:06 24 Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value24Query SHOW SLAVE STATUS24Query SHOW SLAVE STATUS170412 16:55:52 24Query SHOW PROCESSLIST25Connect [email protected] on25 Query set autocommit=1

25Query SELECT CONNECTION_ID() AS Value25Query SELECT @@global.read_only As Value25Query SELECT @@global.read_only As Value25Quit24Query SHOW SLAVE STATUS24 Query SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result24Query STOP SLAVE SQL_THREAD24Query SHOW SLAVE STATUS24Query SHOW MASTER STATUS26Connect [email protected] on26 Query set autocommit=1

26Query SELECT CONNECTION_ID() AS Value26 Query SET sql_log_bin=0

26Query SELECT @@global.read_only As Value26 Query SET GLOBAL read_only=0

26 Query SET sql_log_bin=1

26Quit24Query SELECT @@global.read_only As Value27Connect [email protected] on27Query SELECT UNIX_TIMESTAMP()27 Query SHOW VARIABLES LIKE 'SERVER_ID'

27 Query SET @master_heartbeat_period= 1799999979520

27 Query SET @master_binlog_checksum=@@global.binlog_checksum27Query SELECT @master_binlog_checksum27Query SELECT @@GLOBAL.GTID_MODE27 Query SHOW VARIABLES LIKE 'SERVER_UUID'

27 Query SET @slave_uuid= '8a1093c8-1d00-11e7-954f-000c299a5715'

27 Binlog Dump Log: 'mysql-bin.000010' Pos: 120

170412 16:55:53 28Connect [email protected] on28Query SELECT UNIX_TIMESTAMP()28 Query SHOW VARIABLES LIKE 'SERVER_ID'

28 Query SET @master_heartbeat_period= 1799999979520

28 Query SET @master_binlog_checksum=@@global.binlog_checksum28Query SELECT @master_binlog_checksum28Query SELECT @@GLOBAL.GTID_MODE28 Query SHOW VARIABLES LIKE 'SERVER_UUID'

24Query STOP SLAVE28 Query SET @slave_uuid= '2a6365e0-1d05-11e7-956d-000c29c64704'

28 Binlog Dump Log: 'mysql-bin.000010' Pos: 120

24Query SHOW SLAVE STATUS24 Query RESET SLAVE

24Query SHOW SLAVE STATUS24 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value24 Quit

slave 192.168.244.30

170412 16:52:37 16Connect [email protected] on16 Query set autocommit=1

16Query SELECT CONNECTION_ID() AS Value170412 16:52:38 17Connect [email protected] on17 Query set autocommit=1

17Query SELECT CONNECTION_ID() AS Value17 Query SET wait_timeout=86400

17Query SELECT @@global.server_id As Value17Query SELECT VERSION() AS Value17Query SELECT @@global.gtid_mode As Value17 Query SHOW GLOBAL VARIABLES LIKE 'log_bin'

17Query SHOW MASTER STATUS17Query SELECT @@global.datadir AS Value17Query SELECT @@global.slave_parallel_workers AS Value17Query SHOW SLAVE STATUS17Query SELECT @@global.read_only As Value17Query SELECT @@global.relay_log_purge As Value17Query SELECT @@global.relay_log_info_repository AS Value17Query SELECT @@global.datadir AS Value17Query SELECT @@global.relay_log_info_file AS Value17Query SHOW SLAVE STATUS17 Query SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl'

170412 16:54:05 17 Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value17Query SHOW SLAVE STATUS17Query SHOW SLAVE STATUS170412 16:55:50 17Query SHOW SLAVE STATUS170412 16:55:51 17Query SHOW SLAVE STATUS17 Query SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result17Query STOP SLAVE SQL_THREAD17Query SHOW SLAVE STATUS17Query STOP SLAVE17Query STOP SLAVE17Query SHOW SLAVE STATUS17Query RESET SLAVE17 Query CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD = MASTE

R_PORT= 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120 17 Query SET GLOBAL relay_log_purge=0

17Query START SLAVE18 Connect Out [email protected]:3306

17Query SHOW SLAVE STATUS170412 16:55:52 17 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value17 Quit

参考

mysql空间满了MHA会切换吗_MySQL高可用方案MHA在线切换的步骤及原理