MySQL5.7 新特性： Atomic Truncate

最近在測試mysql5.7時，随手truncate了一個空表，竟然觸發了一次checkpoint操作，每秒寫入量達到好幾百m，直接把redo log 和髒頁刷到底了，顯然在生産場景這是不可接受的。

相關堆棧為：

ha_innobase::truncate->row_truncate_table_for_mysql->log_make_checkpoint_at

一個小小的truncate竟然觸發了一次完全的checkpoint，這到底是為什麼？帶着這個問題，我們來看看在mysql5.7中對truncate table邏輯的相關改動

0.background

在5.7中，開始支援原子的truncate table，這意味着truncate操作是可復原，可恢複的。

但如下的場景可能不支援atomic truncate：

不支援全文索引

存在外鍵限制的場景

分區表

主備架構來看，不是原子的，因為binlog無法復原.

truncate的主要實作在新增檔案row/row0trunc.cc中. 完成通過c++ 類的方式來實作，這和5.6及之前版本是很大的變化，實際上，5.7已經幾乎完全在重構成c++，這對像我這樣習慣了c語言風格的人是個不小的挑戰…

主要包含以下幾個類：

include/row0trunc.h

truncate_t ：用于記錄truncate log資訊的類

|—> index_t //index 類，crash recovery時從日志中擷取，并建構index資訊

truncatelogparser: 用于掃描并解析truncate 日志記錄

row/row0trunc.cc

indexiterator: 用于周遊索引記錄，不支援mvcc，被sysindexiterator類引用到

sysindexiterator: sysindex table iterator, 用于在系統表sys_indexes中檢索指定table id資訊

class callback：回調基類，包含如下子類

|—>truncatelogger：用于建立truncate日志檔案和記錄, ref:truncatelogger::operator()

|—>dropindex：用于在truncate表的過程中drop 索引, ref :dropindex::operator()

|—>createindex：用于在truncate表的過程中建立索引, ref:createindex::operator()

|—>tablelocator：用于在系統表中查找對應table_id, ref: tablelocator::operator()

1. truncate操作過程

這裡我們隻考慮普通的使用者表的執行路徑

入口函數：

ha_innobase::truncate —> row_truncate_table_for_mysql:

step1: truncate合法性檢查，判斷表是否損壞，ibd miss，或者bid已經被discard了

row_truncate_sanity_checks

然後做一次redo checkpoint (!!!!!!!) —— 目前來看是比較可怕的行為，會把undo和髒頁一刷到底，這也是bug#74312提到的問題

log_make_checkpoint_at(lsn_max, true);

根據注釋，做checkpoint的原因是：

– log checkpoint is done before starting truncate table to ensure

that previous redo log entries are not applied if current truncate

crashes. consider following use-case:

– create table …. insert/load table …. truncate table (crash)

– on restart table is restored …. truncate table (crash)

– on restart (assuming default log checkpoint is not done) will have

2 redo log entries for same table. (note 2 redo log entries

for different table is not an issue).

step 2: 如果表不是臨時表，開啟事務

trx_start_for_ddl(trx, trx_dict_op_table);

step 3:

row_mysql_lock_data_dictionary(trx)

dict_operation_lock && dict_sys->mutex

step 4:等待所有背景線程停止使用該表

dict_stats_wait_bg_to_stop_using_table(table, trx);

通過标記table->stats_bg_flag來判定

step5: 檢查是否存在外鍵限制

err = row_truncate_foreign_key_checks(table, trx);

或者是否有memcache dml 引用該表(table->memcached_sync_count)

如果上述存在，則truncate失敗.

移除表上所有的記錄鎖（表鎖除外）：

lock_remove_all_on_table(table, false); （疑問：都truncate到innodb層了，不應該存在記錄鎖的，因為外層mdl鎖就可以保證這一點了）

step 6: 為truncate事務配置設定復原段

err = trx_undo_assign_undo(

trx, &trx->rsegs.m_redo, trx_undo_update);

step 7: 配置設定新的table id .

為什麼需要新的table id ? purge and rollback: we assign a new table id for the table. since purge and rollback look for the table based on the table id, they see the table as ‘dropped’ and discard their operations

dict_hdr_get_new_id(&new_id, null, null, table, false);

同時檢查表上是否存在全文索引。。。以下我們隻考慮普通使用者表，

step 8.

a) x lock表上所有索引dict_table_x_lock_indexes(table);

b)對于非臨時表，且不存在全文索引，并且不是系統表時，調用 row_truncate_prepare(table, &flags); 做必要的檢查，并保證表上面沒有pending的操作，如果insert buffer merge(fil_ibuf_check_pending_ops)， pending io等

對于全文索引，直接調用err = row_truncate_fts(table, new_id, trx); 這裡不展開了.

c) 生成truncate的undo 日志，這也是atomic truncate的核心，即可以通過redo來進行恢複操作，大概分為下面幾步來完成日志記錄

logger = ut_new_nokey(truncatelogger(table, flags, new_id));

err = logger->init();

err = sysindexiterator().for_each(*logger);

err = logger->log();

上調用會建立一個單獨的日志檔案，來儲存truncate的表的相關資訊，以便于crash recovery後重建

例如：

sudo cat /u01/my575/data/ib_469_439_trunc.log

檔案名種的兩個數字取自：

(gdb) p logger->m_table->space

$17 = 469

(gdb) p logger->m_table->id

$18 = 439

分别表示table id 及聚集索引id。

step 9: 删除表上所有的索引以及為索引配置設定的page

dropindex dropindex(table, no_redo);

err = sysindexiterator().for_each(dropindex);

并重新初始化table space的header

if (!is_system_tablespace(table->space)

&& !dict_table_is_temporary(table)

&& flags != ulint_undefined) {

fil_reinit_space_header(

table->space,

table->indexes.count + fil_ibd_file_initial_size + 1);

}

在函數fil_reinit_space_header中，會将屬于該tablespace的page抛棄(buf_lru_flush_or_remove_pages)，同時還抛棄change buffer中的記錄(ibuf_delete_for_discarded_space)

step 10: 重建新的索引

createindex createindex(table, no_redo);

err = sysindexiterator().for_each(createindex);

然後釋放所有的索引鎖

dict_table_x_unlock_indexes(table);

step 11: 更新系統表（sys_tables）中的table id 為新配置設定的table id.

err = row_truncate_update_system_tables(

table, new_id, has_internal_doc_id, no_redo, trx);

調用棧：

row_truncate_update_system_tables->row_truncate_update_system_tables->row_truncate_update_table_id

更新dict cache資訊

dict_table_change_id_in_cache(table, new_id);

step 12: 清理階段，重置auto-inc為1，送出事務，并釋放所有的鎖

dict_table_autoinc_lock(table);

dict_table_autoinc_initialize(table, 1);

dict_table_autoinc_unlock(table);

if (trx_is_started(trx)) {

trx_commit_for_mysql(trx);

return(row_truncate_complete(table, trx, flags, logger, err));

函數row_truncate_complete中完成最後的清理工作(包括commit 和rollback之後都需要調用)：

…釋放dict 鎖，row_mysql_unlock_data_dictionary(trx)

…checkpoint …

…重置stop_new_ops和is_being_truncated，讓該表恢複io操作

dberr_t err2 = truncate_t::truncate(

table->data_dir_path,

table->name, flags, false);

…更新表統計資訊

dict_stats_update(table, dict_stats_empty_table);

2. truncate操作crash recovery階段

如果在崩潰恢複時存在truncate log檔案的話，掃描并解析

innobase_start_or_create_for_mysql

err = truncatelogparser::scan_and_parse(srv_log_group_home_dir)

|—>truncate->parse (truncate_t::parse()

|—>truncate_t::add(truncate) : 解析出來并建構的truncate_t被存儲到truncate_t::s_tables這個static變量

/*一系列正常crash recovery後*/

err = truncate_t::fixup_tables(); //根據之前解析的資訊恢複truncate，繼續完成truncate.

具體的truncate恢複流程不展開說了.

worklog:

(注意這個worklog描述的大部分内容是正确的，但關于truncate redo log實際上在後面替換成了一個單獨的log 檔案，有特定的命名方式）

主要rev:

以及：

相關rev

8723, 8566, 7912, 7755, 7530,7247,7245, 6221, 6207,6198,6196, 6193,6171,6102, 6096,6094

MySQL5.7 新特性： Atomic Truncate

繼續閱讀

spring data JPA中的主鍵政策

一文搞定 MySQL 索引

MySQL（第五篇）—資料的完整性限制（2）（索引、引用完整性）資料的完整性限制（2）三、索引四、引用完整性

【MySQL索引】MySQL索引分類，90%的開發都不知道一、根據底層資料結構劃分二、根據索引字段個數劃分三、根據是否是在主鍵上建立的索引進行劃分四、根據資料與索引的存儲關聯性劃分五、其他分類六、總結

JS--------for循環删除數組元素

Python中字元串常見操作總結

Lucence的基本原理

sql執行計劃 explain 以及結合索引優化sql

MySQL - order by排序優化

2021-08-08 mysql索引

SQLServer 提升查詢速度

詳解SQL中幾種常用的表連接配接方式

oracle 中不使用已有的索引解決辦法

十四、MySQL備份和恢複資料庫1、備份和恢複的方法2、使用mysqldump導出3、恢複轉儲檔案4、字元編碼問題5、鎖表系列結語

對first_name建立唯一索引uniq_idx_firstname問題描述Sql語句

記一次因MySQL編碼問題導緻的慢查詢排查