MySQL5.7 新特性： Atomic Truncate

最近在测试mysql5.7时，随手truncate了一个空表，竟然触发了一次checkpoint操作，每秒写入量达到好几百m，直接把redo log 和脏页刷到底了，显然在生产场景这是不可接受的。

相关堆栈为：

ha_innobase::truncate->row_truncate_table_for_mysql->log_make_checkpoint_at

一个小小的truncate竟然触发了一次完全的checkpoint，这到底是为什么？带着这个问题，我们来看看在mysql5.7中对truncate table逻辑的相关改动

0.background

在5.7中，开始支持原子的truncate table，这意味着truncate操作是可回滚，可恢复的。

但如下的场景可能不支持atomic truncate：

不支持全文索引

存在外键约束的场景

分区表

主备架构来看，不是原子的，因为binlog无法回滚.

truncate的主要实现在新增文件row/row0trunc.cc中. 完成通过c++ 类的方式来实现，这和5.6及之前版本是很大的变化，实际上，5.7已经几乎完全在重构成c++，这对像我这样习惯了c语言风格的人是个不小的挑战…

主要包含以下几个类：

include/row0trunc.h

truncate_t ：用于记录truncate log信息的类

|—> index_t //index 类，crash recovery时从日志中获取，并构建index信息

truncatelogparser: 用于扫描并解析truncate 日志记录

row/row0trunc.cc

indexiterator: 用于遍历索引记录，不支持mvcc，被sysindexiterator类引用到

sysindexiterator: sysindex table iterator, 用于在系统表sys_indexes中检索指定table id信息

class callback：回调基类，包含如下子类

|—>truncatelogger：用于创建truncate日志文件和记录, ref:truncatelogger::operator()

|—>dropindex：用于在truncate表的过程中drop 索引, ref :dropindex::operator()

|—>createindex：用于在truncate表的过程中创建索引, ref:createindex::operator()

|—>tablelocator：用于在系统表中查找对应table_id, ref: tablelocator::operator()

1. truncate操作过程

这里我们只考虑普通的用户表的执行路径

入口函数：

ha_innobase::truncate —> row_truncate_table_for_mysql:

step1: truncate合法性检查，判断表是否损坏，ibd miss，或者bid已经被discard了

row_truncate_sanity_checks

然后做一次redo checkpoint (!!!!!!!) —— 目前来看是比较可怕的行为，会把undo和脏页一刷到底，这也是bug#74312提到的问题

log_make_checkpoint_at(lsn_max, true);

根据注释，做checkpoint的原因是：

– log checkpoint is done before starting truncate table to ensure

that previous redo log entries are not applied if current truncate

crashes. consider following use-case:

– create table …. insert/load table …. truncate table (crash)

– on restart table is restored …. truncate table (crash)

– on restart (assuming default log checkpoint is not done) will have

2 redo log entries for same table. (note 2 redo log entries

for different table is not an issue).

step 2: 如果表不是临时表，开启事务

trx_start_for_ddl(trx, trx_dict_op_table);

step 3:

row_mysql_lock_data_dictionary(trx)

dict_operation_lock && dict_sys->mutex

step 4:等待所有后台线程停止使用该表

dict_stats_wait_bg_to_stop_using_table(table, trx);

通过标记table->stats_bg_flag来判定

step5: 检查是否存在外键约束

err = row_truncate_foreign_key_checks(table, trx);

或者是否有memcache dml 引用该表(table->memcached_sync_count)

如果上述存在，则truncate失败.

移除表上所有的记录锁（表锁除外）：

lock_remove_all_on_table(table, false); （疑问：都truncate到innodb层了，不应该存在记录锁的，因为外层mdl锁就可以保证这一点了）

step 6: 为truncate事务分配回滚段

err = trx_undo_assign_undo(

trx, &trx->rsegs.m_redo, trx_undo_update);

step 7: 分配新的table id .

为什么需要新的table id ? purge and rollback: we assign a new table id for the table. since purge and rollback look for the table based on the table id, they see the table as ‘dropped’ and discard their operations

dict_hdr_get_new_id(&new_id, null, null, table, false);

同时检查表上是否存在全文索引。。。以下我们只考虑普通用户表，

step 8.

a) x lock表上所有索引dict_table_x_lock_indexes(table);

b)对于非临时表，且不存在全文索引，并且不是系统表时，调用 row_truncate_prepare(table, &flags); 做必要的检查，并保证表上面没有pending的操作，如果insert buffer merge(fil_ibuf_check_pending_ops)， pending io等

对于全文索引，直接调用err = row_truncate_fts(table, new_id, trx); 这里不展开了.

c) 生成truncate的undo 日志，这也是atomic truncate的核心，即可以通过redo来进行恢复操作，大概分为下面几步来完成日志记录

logger = ut_new_nokey(truncatelogger(table, flags, new_id));

err = logger->init();

err = sysindexiterator().for_each(*logger);

err = logger->log();

上调用会创建一个单独的日志文件，来保存truncate的表的相关信息，以便于crash recovery后重建

例如：

sudo cat /u01/my575/data/ib_469_439_trunc.log

文件名种的两个数字取自：

(gdb) p logger->m_table->space

$17 = 469

(gdb) p logger->m_table->id

$18 = 439

分别表示table id 及聚集索引id。

step 9: 删除表上所有的索引以及为索引分配的page

dropindex dropindex(table, no_redo);

err = sysindexiterator().for_each(dropindex);

并重新初始化table space的header

if (!is_system_tablespace(table->space)

&& !dict_table_is_temporary(table)

&& flags != ulint_undefined) {

fil_reinit_space_header(

table->space,

table->indexes.count + fil_ibd_file_initial_size + 1);

}

在函数fil_reinit_space_header中，会将属于该tablespace的page抛弃(buf_lru_flush_or_remove_pages)，同时还抛弃change buffer中的记录(ibuf_delete_for_discarded_space)

step 10: 重建新的索引

createindex createindex(table, no_redo);

err = sysindexiterator().for_each(createindex);

然后释放所有的索引锁

dict_table_x_unlock_indexes(table);

step 11: 更新系统表（sys_tables）中的table id 为新分配的table id.

err = row_truncate_update_system_tables(

table, new_id, has_internal_doc_id, no_redo, trx);

调用栈：

row_truncate_update_system_tables->row_truncate_update_system_tables->row_truncate_update_table_id

更新dict cache信息

dict_table_change_id_in_cache(table, new_id);

step 12: 清理阶段，重置auto-inc为1，提交事务，并释放所有的锁

dict_table_autoinc_lock(table);

dict_table_autoinc_initialize(table, 1);

dict_table_autoinc_unlock(table);

if (trx_is_started(trx)) {

trx_commit_for_mysql(trx);

return(row_truncate_complete(table, trx, flags, logger, err));

函数row_truncate_complete中完成最后的清理工作(包括commit 和rollback之后都需要调用)：

…释放dict 锁，row_mysql_unlock_data_dictionary(trx)

…checkpoint …

…重置stop_new_ops和is_being_truncated，让该表恢复io操作

dberr_t err2 = truncate_t::truncate(

table->data_dir_path,

table->name, flags, false);

…更新表统计信息

dict_stats_update(table, dict_stats_empty_table);

2. truncate操作crash recovery阶段

如果在崩溃恢复时存在truncate log文件的话，扫描并解析

innobase_start_or_create_for_mysql

err = truncatelogparser::scan_and_parse(srv_log_group_home_dir)

|—>truncate->parse (truncate_t::parse()

|—>truncate_t::add(truncate) : 解析出来并构建的truncate_t被存储到truncate_t::s_tables这个static变量

/*一系列常规crash recovery后*/

err = truncate_t::fixup_tables(); //根据之前解析的信息恢复truncate，继续完成truncate.

具体的truncate恢复流程不展开说了.

worklog:

(注意这个worklog描述的大部分内容是正确的，但关于truncate redo log实际上在后面替换成了一个单独的log 文件，有特定的命名方式）

主要rev:

以及：

MySQL5.7 新特性： Atomic Truncate

继续阅读

spring data JPA中的主键策略

一文搞定 MySQL 索引

MySQL（第五篇）—数据的完整性约束（2）（索引、引用完整性）数据的完整性约束（2）三、索引四、引用完整性

【MySQL索引】MySQL索引分类，90%的开发都不知道一、根据底层数据结构划分二、根据索引字段个数划分三、根据是否是在主键上建立的索引进行划分四、根据数据与索引的存储关联性划分五、其他分类六、总结

JS--------for循环删除数组元素

Python中字符串常见操作总结

Lucence的基本原理

sql执行计划 explain 以及结合索引优化sql

MySQL - order by排序优化

2021-08-08 mysql索引

SQLServer 提升查询速度

详解SQL中几种常用的表连接方式

oracle 中不使用已有的索引解决办法

十四、MySQL备份和恢复数据库1、备份和恢复的方法2、使用mysqldump导出3、恢复转储文件4、字符编码问题5、锁表系列结语

对first_name创建唯一索引uniq_idx_firstname问题描述Sql语句

记一次因MySQL编码问题导致的慢查询排查