MySQL從5.5版本開始,新增中繼資料鎖,也就是metadata lock,簡稱MDL鎖。
MDL鎖的主要目是保護表中繼資料并行請求過程中不被修改,例如會話1在執行SELECT查詢資料,會話2要新增一列,這時第二個會話會被阻塞,以免造成會話1前後看到的表結構不一緻。
當然了,MDL後來被擴充了,并不隻是針對表(TABLE)對象,也包含庫(SCHEMA)、表空間(TABLESPACE)、存儲程式(過程、函數、觸發器、定時器)等對象,也受到MDL的保護。此外,MDL鎖也會阻塞全局 COMMIT 行為,比如加了FTWRL(FLUSH TABLE WITH READ LOCK)後,就會持有一個全局的 COMMIT 共享鎖,此時會阻止其他會話的事務送出。
我們從 performance_schema.metadata_lock 就可以看到MDL的詳細情況,從MySQL 5.7版本開始,還可以從 sys.schema_table_lock_waits 檢視MDL阻塞等待的情況。要特别注意的是,MDL鎖等待逾時門檻值由選項 lock_wait_timeout 控制,該選項預設值是 31536000秒,也就是 一年、一年、一年(重要的話重複三遍),建議調低,比如改成5-10分鐘,建議最長不超過1小時(想想,這種MDL等待超過1小時還不報警的話,DBA也該下崗了吧)。
另外,想要在PFS(performance_schema)和 sys schema中能看到MDL詳情的話,需要先啟用相關的設定:(橫屏觀看)
[[email protected]]> use performance_schema;
[[email protected]]> UPDATE setup_consumers
SET ENABLED = 'YES' WHERE NAME ='global_instrumentation';
[[email protected]]> UPDATE setup_instruments
SET ENABLED = 'YES' WHERE NAME =‘wait/lock/metadata/sql/mdl';
然後就能檢視目前是否有MDL鎖了:(橫屏觀看)
# session1執行一個未結束的隻讀事務
[[email protected]]> begin; select * from test.t1 limit 1;
# session2 檢視MDL詳情
[[email protected]]> SELECT * FROM metadata_locks\G
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: t1
COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140334142005184
LOCK_TYPE: SHARED_READ -- 該MDL請求類型為共享讀
LOCK_DURATION: TRANSACTION -- 該MDL請求處于事務級别
LOCK_STATUS: GRANTED -- 該MDL請求已獲得
SOURCE: sql_parse.cc:5929
OWNER_THREAD_ID: 1892 -- 線程ID
OWNER_EVENT_ID: 13
已有各路大神對MDL做了詳細解析,是以本文不再贅述,我更喜歡寫點接地氣的。是以我們共同分析一個非常經典的MDL等待案例,借此分析遇到這種問題時,該怎麼分析和判斷,以及如何應對處理。(橫屏觀看)
session1 | session2 | session3 |
begin; select * from t1 limit 1; | ||
alter table t1 add cx int; # 此時該請求被阻塞 | ||
這時執行
SHOW PROCESSLIST
的話,能看到下面這樣的現場(删除了部分無用輸出列)(橫屏觀看)
+------+------+---------------------------------+--------------------------+
| Id | Time | State | Info |
+------+------+---------------------------------+--------------------------+
| 1853 | 0 | starting | show processlist |
| 1854 | 134 | Waiting for table metadata lock | alter table t1 add cx int|
| 1855 | 83 | Waiting for table metadata lock | select * from t1 limit 1 |
+------+------+---------------------------------+--------------------------+
如果隻看現場,我們是沒辦法分析出到底哪個線程導緻的MDL阻塞,這正是MySQL 5.7之前版本的痛苦之處,遇到MDL等待,隻能靠DBA的經驗、感覺去分析,還不一定完全可靠。
但是,5.7版本之後,我們就可以用PFS和sys schema進行跟蹤分析了:(橫屏觀看)
# 1、檢視目前的MDL請求狀态(删除部分資訊)
[[email protected]]> select * from performance_schema.metadata_locks;
+-------------+-----------+---------------------+---------------+-------------+------+
| OBJECT_TYPE | OBJECT_NAM| LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | TID |
+-------------+-----------+---------------------+---------------+-------------+------+
| TABLE | t1 | SHARED_READ | TRANSACTION | GRANTED | 1892 |
| GLOBAL | NULL | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | 1893 |
| SCHEMA | NULL | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | 1893 |
| TABLE | t1 | SHARED_UPGRADABLE | TRANSACTION | GRANTED | 1893 |
| BACKUP LOCK | NULL | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | 1893 |
| TABLESPACE | test/t1 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | 1893 |
| TABLE | #sql-e953_| EXCLUSIVE | STATEMENT | GRANTED | 1893 |
| TABLE | t1 | EXCLUSIVE | TRANSACTION | PENDING | 1893 |
| TABLE | t1 | SHARED_READ | TRANSACTION | PENDING | 1894 |
+-------------+-----------+---------------------+---------------+-------------+------+
# 2、檢視目前的MDL等待狀态
[[email protected]]> select * from sys.schema_table_lock_waits\G
*************************** 1. row ***************************
object_schema: test
object_name: t1
waiting_thread_id: 1893 -- 等待的線程ID
waiting_pid: 1854 -- 等待的連接配接PID
waiting_account: root@localhost
waiting_lock_type: EXCLUSIVE -- 等待的鎖類型
waiting_lock_duration: TRANSACTION
waiting_query: alter table t1 drop cx
waiting_query_secs: 134 -- 鎖等待時長
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 1892 -- 這裡的線程ID值對應上面的TID
blocking_pid: 1853 -- 連接配接PID,對應下方的trx_mysql_thread_id
blocking_account: root@localhost
blocking_lock_type: SHARED_READ -- 阻塞的鎖類型
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 1853
sql_kill_blocking_connection: KILL 1853
*************************** 2. row ***************************
object_schema: test
object_name: t1
waiting_thread_id: 1894
waiting_pid: 1855
waiting_account: root@localhost
waiting_lock_type: SHARED_READ
waiting_lock_duration: TRANSACTION
waiting_query: select * from t1 limit 1
waiting_query_secs: 83
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 1892
blocking_pid: 1853
blocking_account: root@localhost
blocking_lock_type: SHARED_READ
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 1853
sql_kill_blocking_connection: KILL 1853
*************************** 3. row ***************************
object_schema: test
object_name: t1
waiting_thread_id: 1893
waiting_pid: 1854
waiting_account: root@localhost
waiting_lock_type: EXCLUSIVE
waiting_lock_duration: TRANSACTION
waiting_query: alter table t1 drop cx
waiting_query_secs: 83
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 1893
blocking_pid: 1854
blocking_account: root@localhost
blocking_lock_type: SHARED_UPGRADABLE
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 1854
sql_kill_blocking_connection: KILL 1854
*************************** 4. row ***************************
object_schema: test
object_name: t1
waiting_thread_id: 1894
waiting_pid: 1855
waiting_account: root@localhost
waiting_lock_type: SHARED_READ
waiting_lock_duration: TRANSACTION
waiting_query: select * from t1 limit 1
waiting_query_secs: 83
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 1893
blocking_pid: 1854
blocking_account: root@localhost
blocking_lock_type: SHARED_UPGRADABLE
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 1854
sql_kill_blocking_connection: KILL 1854
# 3、順便檢視目前是否有活躍未送出的事務,通常持續時間最久的事務也是引起大面積MDL鎖阻塞的根源
[[email protected]]> select trx_id,trx_state,time_to_sec(timediff(now(), trx_started))
as trx_active_secs, trx_mysql_thread_id as pid, trx_query
from information_schema.innodb_trx
order by trx_active_secs desc \G
*************************** 1. row ***************************
trx_id: 281479749621520
trx_state: RUNNING
trx_active_secs: 226
pid: 1853
trx_query: select trx_id,trx_state...
上述測試的環境是:
[[email protected]]>
...
Server version: 8.0.16 MySQL Community Server - GPL
...
問題分析
從上面的結果可分析得到以下結論:
- 連接配接PID=1854的那個會話,被連接配接PID=1853的會話阻塞了
- 連接配接PID=1855的那個會話,被連接配接PID=1854的會話阻塞了
- 連接配接PID=1855被阻塞的源頭也可以追溯到PID=1853那個會話
- 也就是:session1阻塞session2,然後 session2阻塞session3
問題解決方法:
- 想要讓session2和3都不被阻塞,隻需要讓session1上持有的鎖釋放即可
- 雖然上面提示可以執行
,但實際上是不管用的,因為PID=1853中導緻MDL等待的SQL已經執行結束,隻是事務還沒送出而已,是以正确的方法是執行KILL QUERY 1853
徹底殺掉該連接配接,復原相應的事務,釋放MDL鎖KILL 1853
最後多說一下,MDL是在MySQL server層的鎖,而InnoDB層也有表級别上的IS/IX鎖,二者并不是一回事。Enjoy MySQL :)