天天看點

MySQL Online DDL增量DML記錄和回放的源碼實作ifdef UNIV_DEBUGendif / UNIV_DEBUG /ifdef UNIV_PFS_IOendif / UNIV_PFS_IO /ifdef UNIV_PFS_IOendif / UNIV_PFS_IO /ifdef HAVE_FTRUNCATEendif / HAVE_FTRUNCATE /

中分析并驗證了MySQL進行線上建立索引時,不會因為執行時間過長或業務壓力較大,在回放增量DML時加鎖時間過久而對業務造成嚴重影響,本文從MySQL 8.0.19源碼出發,分析MySQL是如何實作的。同時也确認是否在回放DML時會報duplicate key。

核心處理流程和對象

增量DML處理流程主要在

http://row0log.cc中

/** @file row/row0log.cc

Modification log for online index creation and online table rebuild

Created 2011-05-26 Marko Makela

通過閱讀本檔案代碼可以發現,建立二級索引的增量DML記錄和回放流程跟其他類型DDL是分開的。因為二級索引建立不需要重建表。

建立二級索引的處理流程

緩存增量DML操作

row_log_online_op用于處理建立二級索引的增量DML:

/* Logs an operation to a secondary index that is (or was) being created. /

void row_log_online_op(

dict_index_t *index,   /*!< in/out: index, S or X latched */
const dtuple_t *tuple, /*!< in: index tuple */
trx_id_t trx_id)       /*!< in: transaction ID for insert,
                       or  for delete */           

{

該函數直接調用os_file_write_int_fd将日志寫入到臨時檔案中:

err = os_file_write_int_fd(request, "(modification log)", log->fd,

log->tail.block, byte_offset, srv_sort_buf_size);           

該場景下,寫入臨時檔案的内容為正在建立的二級索引記錄,無需寫入聚集/主鍵索引記錄。這樣可以大大減少臨時檔案的資料寫入量,二級索引記錄構造函數如下:

rec_convert_dtuple_to_temp(b + extra_size, index, tuple->fields,

tuple->n_fields, NULL);           

回放增量DML操作

建立二級索引的增量DML回放入口為row_log_apply:

/** Apply the row log to the index upon completing index creation.

@param[in] trx transaction (for checking if the operation was

interrupted)

@param[in,out] index secondary index

@param[in,out] table MySQL table (for reporting duplicates)

@param[in,out] stage performance schema accounting object, used by

ALTER TABLE. stage->begin_phase_log_index() will be called initially and then

stage->inc() will be called for each block of log that is applied.

@return DB_SUCCESS, or error code on failure */

dberr_t row_log_apply(const trx_t trx, dict_index_t index,

struct TABLE *table, ut_stage_alter_t *stage) {
           

...

rw_lock_x_lock(dict_index_get_lock(index));

if (!index->table->is_corrupted()) {

error = row_log_apply_ops(trx, index, &dup, stage);           

} else {

error = DB_SUCCESS;           

}

...

} else {

ut_ad(dup.n_dup == );
dict_index_set_online_status(index, ONLINE_INDEX_COMPLETE);           

log = index->online_log;

index->online_log = NULL;

rw_lock_x_unlock(dict_index_get_lock(index));

從這段代碼可以發現調用row_log_apply_ops實際執行增量DLM回放前加了對應二級索引的互斥鎖。回放完成,将索引狀态設定為ONLINE_INDEX_COMPLETE,最後進行解鎖。

我們在測試過程已經發現,增量回放過程是不會加長時間鎖的,這跟代碼實作似乎有沖突。我們在後面分析row_log_apply_ops小結揭曉。

表重建場景的處理流程

該場景不是本文分析重點,在此簡單說明下。

對于需重建表的DDL場景,DML操作處理函數分别為row_log_table_apply_insert、row_log_table_apply_update和row_log_table_apply_delete,在函數的注釋上進一步注明了用于回放對應DML操作的函數:

/** Logs an insert to a table that is being rebuilt.

This will be merged in row_log_table_apply_insert(). */

void row_log_table_insert(

const rec_t *rec,       /*!< in: clustered index leaf page record,
                        page X-latched */
const dtuple_t *ventry, /*!< in: dtuple holding virtual column info */
dict_index_t *index,    /*!< in/out: clustered index, S-latched
                        or X-latched */
const ulint *offsets)   /*!< in: rec_get_offsets(rec,index) */           

row_log_table_low(rec, ventry, NULL, index, offsets, true, NULL);

/** Logs an update to a table that is being rebuilt.

This will be merged in row_log_table_apply_update(). */

void row_log_table_update(

const rec_t *rec,          /*!< in: clustered index leaf page record,
                           page X-latched */
dict_index_t *index,       /*!< in/out: clustered index, S-latched
                           or X-latched */
const ulint *offsets,      /*!< in: rec_get_offsets(rec,index) */
const dtuple_t *old_pk,    /*!< in: row_log_table_get_pk()
                           before the update */
const dtuple_t *new_v_row, /*!< in: dtuple contains the new virtual
                         columns */
const dtuple_t *old_v_row) /*!< in: dtuple contains the old virtual
                         columns */           

row_log_table_low(rec, new_v_row, old_v_row, index, offsets, false, old_pk);

/** Logs a delete operation to a table that is being rebuilt.

This will be merged in row_log_table_apply_delete(). */

void row_log_table_delete(

trx_t *trx,             /*!< in: current transaction */
const rec_t *rec,       /*!< in: clustered index leaf page record,
                        page X-latched */
const dtuple_t *ventry, /*!< in: dtuple holding virtual column info */
dict_index_t *index,    /*!< in/out: clustered index, S-latched
                        or X-latched */
const ulint *offsets,   /*!< in: rec_get_offsets(rec,index) */
const byte *sys)        /*!< in: DB_TRX_ID,DB_ROLL_PTR that should
                        be logged, or NULL to use those in rec */           

上述3個函數均是調用row_log_table_close執行增量DML日志格式化和寫入操作:

/* Stops logging an operation to a table that is being rebuilt. /

static void row_log_table_close_func(

row_log_t *log, /*!< in/out: online rebuild log */           

ifdef UNIV_DEBUG

const byte *b, /*!< in: end of log record */           

endif / UNIV_DEBUG /

ulint size,    /*!< in: size of log record */
ulint avail)   /*!< in: available size for log record */           

該函數再最後調用os_file_write_int_fd

log->tail.block, byte_offset, srv_sort_buf_size);           

總的來說,建立二級索引和重建表都需要處理增量DML,但處理方式不一樣,相對來說,建立二級索引場景更加簡單,因為隻需要處理新增的二級索引記錄即可。

增量DML記錄和回放核心對象

row_log_t對象

不管是建立二級索引還是進行表重建,處理增量DML的核心對象都是row_log_t,該對象具體内容如下所示:

/** @brief Buffer for logging modifications during online index creation

All modifications to an index that is being created will be logged by

row_log_online_op() to this buffer.

All modifications to a table that is being rebuilt will be logged by

row_log_table_delete(), row_log_table_update(), row_log_table_insert()

to this buffer.

When head.blocks == tail.blocks, the reader will access tail.block

directly. When also head.bytes == tail.bytes, both counts will be

reset to 0 and the file will be truncated. */

struct row_log_t {

int fd; /!< file descriptor /

ib_mutex_t mutex; /*!< mutex protecting error,

max_trx and tail */           

page_no_map blobs; /!< map of page numbers of off-page columns

that have been freed during table-rebuilding
                   ALTER TABLE (row_log_table_*); protected by
                   index->lock X-latch only */           

dict_table_t table; /!< table that is being rebuilt,

or NULL when this is a secondary
                   index that is being created online */           

bool same_pk; /*!< whether the definition of the PRIMARY KEY

has remained the same */           

const dtuple_t *add_cols;

/!< default values of added columns, or NULL /

const ulint col_map; /!< mapping of old column numbers to

new ones, or NULL if !table */           

dberr_t error; /*!< error that occurred during online

table rebuild */           

trx_id_t max_trx; /*!< biggest observed trx_id in

row_log_online_op();
                    protected by mutex and index->lock S-latch,
                    or by index->lock X-latch only */           

row_log_buf_t tail; /*!< writer context;

protected by mutex and index->lock S-latch,
                    or by index->lock X-latch only */           

row_log_buf_t head; /*!< reader context; protected by MDL only;

modifiable by row_log_apply_ops() */           

ulint n_old_col;

/*!< number of non-virtual column in

old table */

ulint n_old_vcol;

/!< number of virtual column in old table /

const char path; /!< where to create temporary file during

log operation */           

};

這裡我們僅分析建立二級索引場景,關注的字段包括fd、tail、head和path。

fd和path分别表示緩存增量DML的檔案路徑和檔案句柄。path所在目錄為所設定的innodb_tmpdir指定,若該值為空,則設定為tmpdir對應目錄。

tail和head為row_log_buf_t對象,分别用于進行增量DML緩存和回放。我們單獨放在一個小結說明。

緩存增量DML的臨時檔案

臨時檔案由row_log_tmpfile建立并打開:

/** Create the file or online log if it does not exist.

@param[in,out] log online rebuild log

@return true if success, false if not */

static MY_ATTRIBUTE((warn_unused_result)) int row_log_tmpfile(row_log_t *log) {

DBUG_TRACE;

if (log->fd < ) {

log->fd = row_merge_file_create_low(log->path);
DBUG_EXECUTE_IF("row_log_tmpfile_fail",
                if (log->fd > ) row_merge_file_destroy_low(log->fd);
                log->fd = -1;);
if (log->fd >= ) {
  MONITOR_ATOMIC_INC(MONITOR_ALTER_TABLE_LOG_FILES);
}           

return log->fd;

/** Create temporary merge files in the given paramater path, and if

UNIV_PFS_IO defined, register the file descriptor with Performance Schema.

@param[in] path location for creating temporary merge files.

@return File descriptor */

int row_merge_file_create_low(const char *path) {

int fd;

if (path == NULL) {

path = innobase_mysql_tmpdir();           

ifdef UNIV_PFS_IO

/* This temp file open does not go through normal

file APIs, add instrumentation to register with

performance schema */

Datafile df;

df.make_filepath(path, "Innodb Merge Temp File", NO_EXT);

struct PSI_file_locker *locker = NULL;

PSI_file_locker_state state;

locker = PSI_FILE_CALL(get_thread_file_name_locker)(

&state, innodb_temp_file_key.m_value, PSI_FILE_OPEN, df.filepath(),
  &locker);
           

if (locker != NULL) {

PSI_FILE_CALL(start_file_open_wait)(locker, __FILE__, __LINE__);           

endif / UNIV_PFS_IO /

fd = innobase_mysql_tmpfile(path);

PSI_FILE_CALL(end_file_open_wait_and_bind_to_descriptor)(locker, fd);           

if (fd < ) {

ib::error(ER_IB_MSG_967) << "Cannot create temporary merge file";
return (-1);           

return (fd);

從中可以看出,所建立的檔案名為“Innodb Merge Temp File”,可通過performance_schema.file_instances等系統表檢視

手機遊戲拍賣平台

臨時檔案位置和相關統計資訊。如下所示:

node1-performance_schema>select from file_instances where FILE_NAME like "%%Innodb Merge Temp File%%"\G 1. row *

FILE_NAME: /tmp/Innodb Merge Temp File

EVENT_NAME: wait/io/file/innodb/innodb_temp_file

OPEN_COUNT: 2

1 row in set (0.00 sec)

node1-performance_schema>select * from file_summary_by_instance where FILE_NAME like "%%Innodb Merge Temp File%%"\G

1. row **

FILE_NAME: /tmp/Innodb Merge Temp File
           EVENT_NAME: wait/io/file/innodb/innodb_temp_file
OBJECT_INSTANCE_BEGIN: 140548089243840
           COUNT_STAR: 18484
       SUM_TIMER_WAIT: 7393902183975
       MIN_TIMER_WAIT: 76528245
       AVG_TIMER_WAIT: 400015995
       MAX_TIMER_WAIT: 27160453440
           COUNT_READ: 9240
       SUM_TIMER_READ: 2499001980465
       MIN_TIMER_READ: 183015375
       AVG_TIMER_READ: 270454725
       MAX_TIMER_READ: 27160453440           

SUM_NUMBER_OF_BYTES_READ: 9688842240

COUNT_WRITE: 9240
      SUM_TIMER_WRITE: 4894539195270
      MIN_TIMER_WRITE: 385078965
      AVG_TIMER_WRITE: 529711680
      MAX_TIMER_WRITE: 1293598650           

SUM_NUMBER_OF_BYTES_WRITE: 9688842240

COUNT_MISC: 4
       SUM_TIMER_MISC: 361008240
       MIN_TIMER_MISC: 76528245
       AVG_TIMER_MISC: 90252060
       MAX_TIMER_MISC: 106280070           

臨時檔案大小由參數innodb_online_alter_log_max_size确定。

node1-sbtest>show variables like "%%innodb_online_alter_log_max_size%%";
Variable_name Value
innodb_online_alter_log_max_size 134217728

1 row in set (.01 sec)

該參數預設值為128M,可線上調整,若在執行過程中将該參數調小或設定值不夠大,會導緻DDL操作失敗,如下例子所示:

node1-performance_schema>show variables like "%%innodb_online_alter_log_max_size%%";
65536

node1-sbtest>alter table sbtest1 add index idx_d(wzh);

ERROR 1799 (HY000): Creating index 'idx_d' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again.

row_log_buf_t

row_log_buf_t是另一個重要對象,定義如下:

/* Log block for modifications during online ALTER TABLE /

struct row_log_buf_t {

byte block; /!< file block buffer */

ut_new_pfx_t block_pfx; /*!< opaque descriptor of "block". Set

by ut_allocator::allocate_large() and fed to
                   ut_allocator::deallocate_large(). */           

mrec_buf_t buf; /*!< buffer for accessing a record

that spans two blocks */           

ulint blocks; /!< current position in blocks /

ulint bytes; /!< current position within block /

ulonglong total; /*!< logical position, in bytes from

the start of the row_log_table log;
                       for row_log_online_op() and
                      row_log_apply(). */           

根據定義,進一步結合處理流程可以知道,增量DML日志的緩存(寫入臨時檔案)和回放(讀取臨時檔案)時以記錄塊為機關進行的。一個記錄塊可儲存一條或多條增量DML日志。一條增量DML日志可能跨2個記錄塊。

在row_log_buf_t對象中,block字段表示目前正在操作的最後一個未滿的記錄塊,bytes是該記錄塊已使用的位元組數,blocks表示已經往臨時檔案中寫入多少個記錄塊。buf用于處理一條DML日志橫跨2個記錄塊的場景。

記錄塊的大小由參數innodb_sort_buffer_size指定:

node1-performance_schema>show variables like "%%innodb_sort_buffer_size%%";
innodb_sort_buffer_size 1048576

參數預設為1MB,該參數為隻讀參數,無法動态調整。

增量DML寫入實作分析

我們首先看看row_log_online_op函數的調用場景,經查詢發現大緻有2處調用,分别為row_log_online_op_try和row_upd_sec_index_entry_low,如下所示:

/** Try to log an operation to a secondary index that is

(or was) being created.

@retval true if the operation was logged or can be ignored

@retval false if online index creation is not taking place */

UNIV_INLINE

bool row_log_online_op_try(

dict_index_t *index,   /*!< in/out: index, S or X latched */
const dtuple_t *tuple, /*!< in: index tuple */
trx_id_t trx_id)       /*!< in: transaction ID for insert,
                       or 0 for delete */           

ut_ad(rw_lock_own_flagged(dict_index_get_lock(index),

RW_LOCK_FLAG_S | RW_LOCK_FLAG_X | RW_LOCK_FLAG_SX));
           

switch (dict_index_get_online_status(index)) {

case ONLINE_INDEX_COMPLETE:
  /* This is a normal index. Do not log anything.
  The caller must perform the operation on the
  index tree directly. */
  return (false);
case ONLINE_INDEX_CREATION:
  /* The index is being created online. Log the
  operation. */
  row_log_online_op(index, tuple, trx_id);
  break;
case ONLINE_INDEX_ABORTED:
case ONLINE_INDEX_ABORTED_DROPPED:
  /* The index was created online, but the operation was
  aborted. Do not log the operation and tell the caller
  to skip the operation. */
  break;           

return (true);

/** Updates a secondary index entry of a row.

@param[in] node row update node

@param[in] old_entry the old entry to search, or nullptr then it

has to be created in this function           

@param[in] thr query thread

@return DB_SUCCESS if operation successfully completed, else error

code or DB_LOCK_WAIT */

static MY_ATTRIBUTE((warn_unused_result)) dberr_t

row_upd_sec_index_entry_low(upd_node_t *node, dtuple_t *old_entry,
                            que_thr_t *thr) {
...
mtr_s_lock(dict_index_get_lock(index), &mtr);

switch (dict_index_get_online_status(index)) {
  case ONLINE_INDEX_COMPLETE:
    /* This is a normal index. Do not log anything.
    Perform the update on the index tree directly. */
    break;
  case ONLINE_INDEX_CREATION:
    /* Log a DELETE and optionally INSERT. */
    row_log_online_op(index, entry, );

    if (!node->is_delete) {
      mem_heap_empty(heap);
      entry =
          row_build_index_entry(node->upd_row, node->upd_ext, index, heap);
      ut_a(entry);
      row_log_online_op(index, entry, trx->id);
    }
    /* fall through */
...           

row_upd_sec_index_entry_low為對二級索引的更新場景。進一步溯源可以發現,row_log_online_op_try由二級索引的插入和删除等場景的處理函數調用。這是可以了解的,不深入分析。這樣展示的是,不管那個路徑進來,都是持有二級索引的鎖的。這也可以了解,但似乎跟回放DML日志的流程有鎖沖突。問題先抛出來,後面再分析。

從上面還可以看出,對于一個DML操作,會先寫一條DELETE日志(row_log_online_op第三參數為0),如果該DML不是删除操作,那麼再寫一條INSERT操作。也就是說,處理DML時,删除操作仍保持為删除,插入和更新均改寫為先删除再插入的形式。

(2020-3-23:這樣的處理方式,不應該會導緻duplicate key才對,歡迎讨論)

(2020-3-23 12:32:55,又分析了下代碼,發現唯一索引還是會有問題的,如下所示:

/* Ensure that we acquire index->lock when inserting into an

index with index->online_status == ONLINE_INDEX_COMPLETE, but

could still be subject to rollback_inplace_alter_table().

This prevents a concurrent change of index->online_status.

The memory object cannot be freed as long as we have an open

reference to the table, or index->table->n_ref_count > 0. */

bool check = !index->is_committed();

DBUG_EXECUTE_IF("idx_mimic_not_committed", {

check = true;
mode = BTR_MODIFY_TREE;           

});

if (check) {

DEBUG_SYNC_C("row_ins_sec_index_enter");
if (mode == BTR_MODIFY_LEAF) {
  search_mode |= BTR_ALREADY_S_LATCHED;
  mtr_s_lock(dict_index_get_lock(index), &mtr);
} else {
  mtr_sx_lock(dict_index_get_lock(index), &mtr);
}

if (row_log_online_op_try(index, entry, thr_get_trx(thr)->id)) {
  goto func_exit;
}           
...
  err = row_ins_scan_sec_index_for_duplicate(flags, index, entry, thr, check,
                                           &mtr, offsets_heap);

mtr_commit(&mtr);

switch (err) {
  case DB_SUCCESS:
    break;
  case DB_DUPLICATE_KEY:
    if (!index->is_committed()) {
      ut_ad(!thr_get_trx(thr)->dict_operation_lock_mode);

      dict_set_corrupted(index);
      /* Do not return any error to the
      caller. The duplicate will be reported
      by ALTER TABLE or CREATE UNIQUE INDEX.
      Unfortunately we cannot report the
      duplicate key value to the DDL thread,
      because the altered_table object is
      private to its call stack. */
      err = DB_SUCCESS;
    }
    /* fall through */           

先插入增量DML日志再進行唯一性限制檢查,雖然err被置為DB_SUCCESS,但index被标記為corrupted,是以會導緻索引的操作出錯)

在row_log_online_op中,由如下代碼段判斷增量DML是否超過了設定的innodb_online_alter_log_max_size(srv_online_max_size):

const os_offset_t byte_offset =

(os_offset_t)log->tail.blocks * srv_sort_buf_size;
           

if (byte_offset + srv_sort_buf_size >= srv_online_max_size) {

goto write_failed;

這裡引申出一個問題,在上一篇進行驗證時,在sysbench oltp tps超過3k負載下,建立二級索引操作執行了約30分鐘,預設最大為128M的增量日志檔案竟然沒有超出,這說明記錄的日志量是比較有限的。

我們接着看row_log_online_op函數實作:

avail_size = srv_sort_buf_size - log->tail.bytes;

if (mrec_size > avail_size) {

b = log->tail.buf;           
b = log->tail.block + log->tail.bytes;           

if (mrec_size >= avail_size) {

dberr_t err;
IORequest request(IORequest::WRITE);
const os_offset_t byte_offset =
    (os_offset_t)log->tail.blocks * srv_sort_buf_size;

if (byte_offset + srv_sort_buf_size >= srv_online_max_size) {
  goto write_failed;
}

if (mrec_size == avail_size) {
  ut_ad(b == &log->tail.block[srv_sort_buf_size]);
} else {
  ut_ad(b == log->tail.buf + mrec_size);
  memcpy(log->tail.block + log->tail.bytes, log->tail.buf, avail_size);
}

UNIV_MEM_ASSERT_RW(log->tail.block, srv_sort_buf_size);

if (row_log_tmpfile(log) < ) {
  log->error = DB_OUT_OF_MEMORY;
  goto err_exit;
}

err = os_file_write_int_fd(request, "(modification log)", log->fd,
                           log->tail.block, byte_offset, srv_sort_buf_size);

log->tail.blocks++;
if (err != DB_SUCCESS) {
write_failed:
  /* We set the flag directly instead of
  invoking dict_set_corrupted() here,
  because the index is not "public" yet. */
  index->type |= DICT_CORRUPT;
}
UNIV_MEM_INVALID(log->tail.block, srv_sort_buf_size);
memcpy(log->tail.block, log->tail.buf + avail_size, mrec_size - avail_size);
log->tail.bytes = mrec_size - avail_size;           
log->tail.bytes += mrec_size;
ut_ad(b == log->tail.block + log->tail.bytes);           

當等待緩存的增量DML日志量mrec_size大于等于目前記錄塊的可用空間avail_size時,會觸發将記錄塊寫入臨時檔案的操作。如果mrec_size等于avail_size,那麼直接寫入目前記錄塊。

如果mrec_size大于avail_size,那麼會将目前的DML日志先寫入tail.buf字段,并拷貝DML日志前面部分到目前記錄塊,将其填滿。再調用os_file_write_int_fd将記錄塊寫入臨時檔案。

完成目前記錄塊寫入臨時檔案後,把DML日志的剩餘部分拷貝到已經空閑的tail.block上。

從這裡我們可以确認,DML日志不會全部緩存在記憶體中,而是會寫入到臨時檔案中,記憶體中僅保留最後一個記錄塊。是以不存在執行時間過長引起記憶體空間占用過多的問題。相對來說,臨時檔案磁盤空間消耗,問題會小很多,而且上面也提到,對于建立二級索引的DDL場景,産生的增量日志量還是遠遠少于拷貝表中全量資料這種實作方式。

增量DML回放實作分析

前面提到row_log_apply函數為日志回放的入口,而且是加了二級索引的鎖的。似乎會導緻回放期間DML操作阻塞,接下來就看看源碼是如何處理的。

分析由row_log_apply_ops負責的具體回放操作。在該函數中,跟網上大佬分析MySQL 5.6線上加索引的實作一樣的,雖然進入該函數時加了index鎖,但在處理非最後一個block時,會釋放鎖,然後讀取檔案上的對應日志塊并進行回放:

ut_ad(has_index_lock);

has_index_lock = false;
rw_lock_x_unlock(dict_index_get_lock(index));

log_free_check();

if (!row_log_block_allocate(index->online_log->head)) {
  error = DB_OUT_OF_MEMORY;
  goto func_exit;
}

IORequest request;
dberr_t err = os_file_read_no_error_handling_int_fd(
    request, index->online_log->path, index->online_log->fd,
    index->online_log->head.block, ofs, srv_sort_buf_size, NULL);

...

while (!trx_is_interrupted(trx)) {
  mrec = next_mrec;
  ut_ad(mrec < mrec_end);

if (!has_index_lock) {
  /* We are applying operations from a different
  block than the one that is being written to.
  We do not hold index->lock in order to
  allow other threads to concurrently buffer
  modifications. */
  ut_ad(mrec >= index->online_log->head.block);
  ut_ad(mrec_end == index->online_log->head.block + srv_sort_buf_size);
  ut_ad(index->online_log->head.bytes < srv_sort_buf_size);

  /* Take the opportunity to do a redo log
  checkpoint if needed. */
  log_free_check();
} else {
  /* We are applying operations from the last block.
  Do not allow other threads to buffer anything,
  so that we can finally catch up and synchronize. */
  ut_ad(index->online_log->head.blocks == );
  ut_ad(index->online_log->tail.blocks == );
  ut_ad(mrec_end ==
        index->online_log->tail.block + index->online_log->tail.bytes);
  ut_ad(mrec >= index->online_log->tail.block);
}

next_mrec = row_log_apply_op(index, dup, &error, offsets_heap, heap,
                             has_index_lock, mrec, mrec_end, offsets);           

在回放場景使用的是row_log_t對象的head子對象,block字段緩存從臨時檔案中讀去的日志塊,調用row_log_apply_op回放DML日志,row_log_apply_op會傳回下一個DML日志的位置,是以通過while循環記錄完成整個block回放。

完成該block上每條DML日志回放後,會重新加上二級索引的互斥鎖,然後修改進度參數:head的blocks字段增一并将偏移量置位。通過next_block标志跳轉來繼續處理下一個block,如下所示:

process_next_block:

rw_lock_x_lock(dict_index_get_lock(index));
  has_index_lock = true;

  index->online_log->head.bytes = ;
  index->online_log->head.blocks++;
  goto next_block;           

調整後先驗證狀态是否合法。接着會判斷接下來處理的記錄塊是否為最後一個block,如果是(判斷标準就是head和tail的blocks字段相同),那麼已經持有的二級索引互斥鎖會繼續保持。

處理最後一個block時不需要從日志檔案中讀取block,因為最後一個block還緩存在記憶體中。是以,在開始處理前會先将用于緩存增量DML日志的臨時檔案truncate掉,避免無意義的存儲資源消耗。完成所有DML日志處理後,會将傳回值設定為DB_SUCCESS,并跳轉到func_exit辨別的代碼段,進行退出row_log_apply_op前的最後處理。

next_block:

ut_ad(has_index_lock);

ut_ad(rw_lock_own(dict_index_get_lock(index), RW_LOCK_X));

ut_ad(index->online_log->head.bytes == );

stage->inc(row_log_progress_inc_per_block());

if (trx_is_interrupted(trx)) {

goto interrupted;           

if (index->online_log->head.blocks == index->online_log->tail.blocks) {

if (index->online_log->head.blocks) {           

ifdef HAVE_FTRUNCATE

/* Truncate the file in order to save space. */
  if (index->online_log->fd > 0 &&
      ftruncate(index->online_log->fd, 0) == -1) {
    perror("ftruncate");
  }           

endif / HAVE_FTRUNCATE /

index->online_log->head.blocks = index->online_log->tail.blocks = 0;
}

next_mrec = index->online_log->tail.block;
next_mrec_end = next_mrec + index->online_log->tail.bytes;

if (next_mrec_end == next_mrec) {
  /* End of log reached. */
all_done:
  ut_ad(has_index_lock);
  ut_ad(index->online_log->head.blocks == 0);
  ut_ad(index->online_log->tail.blocks == 0);
  error = DB_SUCCESS;
  goto func_exit;
}           

從前面幾段代碼可以發現,建立二級索引時會通過trx_is_interrupted判斷建立操作是否被中斷,也就是說可以通過kill等方式終止建立操作。

總結

本文分析了MySQL 8.0上Online DDL功能中建立二級索引場景的增量DML處理流程,從源碼層面确認了加鎖時間并不是跟增量DML的數量正相關,應該說MySQL該環節的處理是比較。Online DDL是個很大的功能集,後續将通過其他文章分析索引建立的全量索引記錄構造階段。