今天同事在主库在给一个大表的字段新加了一个索引,因为是大表,所以用了pt-osc工具,在添加完索引没过多久,开发那边反应丢数据了。
这个表以前是写在主库的,后来不知道是什么原因改成了写从库,也就是说主库有部分以前的历史数据,但是新的数据都是写在从库上,同事用pt-osc加索引的时候是在主库加的,于是把主库的历史数据同步过来了,从库上的新数据丢了。
通过以下例子也以看出为什么会发生这种情况:
>select count(*) from goods;
+----------+
| count(*) |
| 1426200 |
CREATE TABLE `goods` (
`rec_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`user_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`goods_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
`add_time` int(11) unsigned NOT NULL DEFAULT '0',
`is_attention` tinyint(1) NOT NULL DEFAULT '0',
`wid` tinyint(3) unsigned NOT NULL DEFAULT '1' COMMENT '仓id,默认为1表示中国仓',
PRIMARY KEY (`rec_id`),
KEY `user_id` (`user_id`),
KEY `goods_id` (`goods_id`),
KEY `is_attention` (`is_attention`)
) ENGINE=InnoDB AUTO_INCREMENT=1721748 DEFAULT CHARSET=utf8
我们先把goods表上的user_id索引删除:
alter table goods drop key user_id;
在主库上删除部分数据:
>set sql_log_bin=off;
>delete from goods where rec_id>721747;
Query OK, 848254 rows affected (20.24 sec)
>set sql_log_bin=on;
>select count(*) from test;
| 577946 |
测试一:
先直接用alter table语句对goods表的user_id字段加索引:
>alter table test add key (user_id);
主库:
show create table test\G
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
KEY `is_attention` (`is_attention`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1721748 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
1 row in set (0.22 sec)
从库:
1 row in set (0.47 sec)
可以看到直接用alter table加索引的时候从库的数据没有变化。
测试二:
用pt-osc工作在主库goods表的user_id列加索引
先删除user_id列的索引,在主库上执行:
alter table test drop index user_id;
pt-osc加索引:
pt-online-schema-change --nocheck-replication-filters --recursion-method=none --charset=utf8 --alter "add index (user_id) " h=localhost,P=3306,u=root,p=123456,D=test,t=test --print --execute
Successfully altered `test`.`test`. 提示索引加成功。
可以看到,从库的数据被主库覆盖。
为什么会出现这个原因呢,我们可以具体看一下加索引的过程:
# pt-online-schema-change --nocheck-replication-filters --recursion-method=none --charset=utf8 --alter "add index (user_id) " h=localhost,P=3306,u=root,p=123456,D=test,t=test --print --execute
No slaves found. See --recursion-method if host master has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_foreign_keys, 10, 1
Altering `test`.`test`...
Creating new table...
CREATE TABLE `test`.`_test_new` (
Created new table test._test_new OK.
Altering new table...
ALTER TABLE `test`.`_test_new` add index (user_id)
Altered `test`.`_test_new` OK.
2016-02-01T18:11:48 Creating triggers...
CREATE TRIGGER `pt_osc_test_test_del` AFTER DELETE ON `test`.`test` FOR EACH ROW DELETE IGNORE FROM `test`.`_test_new` WHERE `test`.`_test_new`.`rec_id` <=> OLD.`rec_id`
CREATE TRIGGER `pt_osc_test_test_upd` AFTER UPDATE ON `test`.`test` FOR EACH ROW REPLACE INTO `test`.`_test_new` (`rec_id`, `user_id`, `goods_id`, `add_time`, `is_attention`, `wid`) VALUES (NEW.`rec_id`, NEW.`user_id`, NEW.`goods_id`, NEW.`add_time`, NEW.`is_attention`, NEW.`wid`)
CREATE TRIGGER `pt_osc_test_test_ins` AFTER INSERT ON `test`.`test` FOR EACH ROW REPLACE INTO `test`.`_test_new` (`rec_id`, `user_id`, `goods_id`, `add_time`, `is_attention`, `wid`) VALUES (NEW.`rec_id`, NEW.`user_id`, NEW.`goods_id`, NEW.`add_time`, NEW.`is_attention`, NEW.`wid`)
2016-02-01T18:11:48 Created triggers OK.
2016-02-01T18:11:48 Copying approximately 578405 rows...
INSERT LOW_PRIORITY IGNORE INTO `test`.`_test_new` (`rec_id`, `user_id`, `goods_id`, `add_time`, `is_attention`, `wid`) SELECT `rec_id`, `user_id`, `goods_id`, `add_time`, `is_attention`, `wid` FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`rec_id` >= ?)) AND ((`rec_id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 30910 copy nibble*/
SELECT /*!40001 SQL_NO_CACHE */ `rec_id` FROM `test`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`rec_id` >= ?)) ORDER BY `rec_id` LIMIT ?, 2 /*next chunk boundary*/
2016-02-01T18:12:09 Copied rows OK.
2016-02-01T18:12:09 Swapping tables...
RENAME TABLE `test`.`test` TO `test`.`_test_old`, `test`.`_test_new` TO `test`.`test`
2016-02-01T18:12:09 Swapped original and new tables OK.
2016-02-01T18:12:09 Dropping old table...
DROP TABLE IF EXISTS `test`.`_test_old`
2016-02-01T18:12:09 Dropped old table `test`.`_test_old` OK.
2016-02-01T18:12:09 Dropping triggers...
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_del`;
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_upd`;
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_ins`;
2016-02-01T18:12:09 Dropped triggers OK.
Successfully altered `test`.`test`.
在建索引的过程中,pt-osc会新将原表的数据拷贝到一张临时表里面,创建三个存储过程来同步有变更的数据,先在临时表上面加索引,加完索引后再将临时表rename。
其实如果可以尽量只写主库不写从库,这样可以避免很多坑。
本文转自 emma_cql 51CTO博客,原文链接:http://blog.51cto.com/chenql/1740518