天天看点

主备不一致:Table definition on master and slave does not match

昨天一同事在线上做变更,为了保证主库的稳定性,先在备库把binlog关闭,然后在进行ddl变更,在通过切换ha,把备库切换为主库,在老的主库上做ddl变更

看上去这样做法没有太大的问题,但是当备库变更一做完,ha切换到备库,开始老主库变更的时候,备库就出现复制出现错误:

last_error: table definition on master and slave does not match: column 10 type mismatch – received type 3, dbname.table_name has type 8

skip_counter: 0

exec_master_log_pos: 1046252634

relay_log_space: 2910773181

until_condition: none

until_log_file:

until_log_pos: 0

master_ssl_allowed: no

master_ssl_ca_file:

master_ssl_ca_path:

master_ssl_cert:

master_ssl_cipher:

master_ssl_key:

seconds_behind_master: null

master_ssl_verify_server_cert: no

last_io_errno: 0

last_io_error:

last_sql_errno: 1535

last_sql_error: table definition on master and slave does not match: column 10 type mismatch – received type 3, dbname.table_name has type 8

1 row in set (0.00 sec)

<1>从这个错误上来看,是主备的表结构不一致导致的,但是之前的复制都是好好的,为什么做完变更后就会出现这个问题,应该是在ddl变更后导致的问题;

master:

mysql -uroot dbname -e “show create table table_name”>master.sql

slave:

mysql -uroot dbname -e “show create table table_name”>slave.sql

diff -u master.sql slave.sql没有找到两个表结构有什么大问题;

<2>查看出问题的数据字段:

[email protected] : dbname 17:46:35> desc table_name;

+—————-+———————+——+—–+——————-+—————————–+

| field | type | null | key | default | extra |

| url | varchar(333) | no | uni | null | |

| description | varchar(255) | yes | | null | |

| httpstatus | int(11) | yes | | null | |

| addtime | timestamp | no | | current_timestamp | on update current_timestamp |

| serverbanner | varchar(255) | yes | | null | |

| tasktag | varchar(255) | no | mul | null | |

| tasktag2 | varchar(255) | yes | | null | |

| domainname | varchar(255) | no | mul | null | |

| r_domainname | varchar(255) | yes | mul | null | |

| scantaskid | int(11) | yes | | null | |

| subtaskid | bigint(20) unsigned | no | pri | null | auto_increment |

| urlhash | varchar(32) | yes | uni | | |

| duplicatehash | varchar(32) | yes | | | |

| enable | tinyint(1) | yes | | 0 | |

| webappid | int(11) | yes | | null | |

| crc_domainname | int(10) unsigned | yes | mul | null | |

| wapscore | int(11) | yes | | 0 | |

| ip | varchar(45) | yes | | null | |

[email protected] : information_schema 14:59:40> select * from columns where table_schema=”dbname” and table_name=”table_name” and ordinal_position= 10\g;

*************************** 1. row ***************************

table_catalog: null

table_schema: dbname

table_name: table_name

column_name: scantaskid

ordinal_position: 10

column_default: null

is_nullable: yes

data_type: int

character_maximum_length: null

character_octet_length: null

numeric_precision: 10

numeric_scale: 0

character_set_name: null

collation_name: null

column_type: int(11)

column_key:

extra:

privileges: select,insert,update,references

column_comment:

[email protected] : information_schema 14:59:19> select * from information_schema.columns where table_schema=”dbname” and table_name=”table_name” and ordinal_position= 10\g;

查看数据字段,主备库还是一致的,这个时候好像到了穷途;

<3>查看同事昨天的ddl变更脚本,涉及到加字段,调整字段的长度,看上去很平常,

我们是先在备库做的变更,然后在到主库的变更,期间的binlog是关闭的,这时候,印风同学想到如果在备库变更的时候,主库的业务是没有停止的,

如果主库变更的数据同步到备库,备库的变更做完,主备已经不一致了,这样的话,就会造成复制失败了,看了看脚本中有字段长度调长的,这下就迎刃而解了;

问了一下b2b的plinux,他们只有加字段的时候,才放到备库上去做,其他的还是在主库上直接做的;

<4>刚才看到从 information_schema.columns 中查询有问题的列的时候,直接代入ordinal_position= 10得到的是scantaskid

字段,但出问题的字段是第11为字段(即我们调整长度的字段),所以binlog中是从0开始计算字段的位置的;

<5>5.5中报错显得更加人性了:

column 0 of table ‘test.t3’ cannot be converted from type ‘int’ to type ‘bigint(20)’;

<6>.那如何避免这样的问题喃,由于我们的库采用的是row模式,只要把slave的复制改为statement就可以了,将主库的binlog_format由row改为statement,这样达到备库的binlog就不会出现错误;