天天看點

mysql大量的waiting for table level lock怎麼辦

MySQL從5.5版本開始,新增中繼資料鎖,也就是metadata lock,簡稱MDL鎖。

MDL鎖的主要目是保護表中繼資料并行請求過程中不被修改,例如會話1在執行SELECT查詢資料,會話2要新增一列,這時第二個會話會被阻塞,以免造成會話1前後看到的表結構不一緻。

當然了,MDL後來被擴充了,并不隻是針對表(TABLE)對象,也包含庫(SCHEMA)、表空間(TABLESPACE)、存儲程式(過程、函數、觸發器、定時器)等對象,也受到MDL的保護。此外,MDL鎖也會阻塞全局 COMMIT 行為,比如加了FTWRL(FLUSH TABLE WITH READ LOCK)後,就會持有一個全局的 COMMIT 共享鎖,此時會阻止其他會話的事務送出。

我們從 performance_schema.metadata_lock 就可以看到MDL的詳細情況,從MySQL 5.7版本開始,還可以從 sys.schema_table_lock_waits 檢視MDL阻塞等待的情況。要特别注意的是,MDL鎖等待逾時門檻值由選項 lock_wait_timeout 控制,該選項預設值是 31536000秒,也就是 一年、一年、一年(重要的話重複三遍),建議調低,比如改成5-10分鐘,建議最長不超過1小時(想想,這種MDL等待超過1小時還不報警的話,DBA也該下崗了吧)。

另外,想要在PFS(performance_schema)和 sys schema中能看到MDL詳情的話,需要先啟用相關的設定:(橫屏觀看)

[[email protected]]> use performance_schema;
[[email protected]]> UPDATE setup_consumers
    SET ENABLED = 'YES' WHERE NAME ='global_instrumentation';
[[email protected]]> UPDATE setup_instruments
    SET ENABLED = 'YES' WHERE NAME =‘wait/lock/metadata/sql/mdl';      

然後就能檢視目前是否有MDL鎖了:(橫屏觀看)

# session1執行一個未結束的隻讀事務
[[email protected]]> begin; select * from test.t1 limit 1;

# session2 檢視MDL詳情
[[email protected]]> SELECT * FROM metadata_locks\G
*************************** 1. row ***************************
          OBJECT_TYPE: TABLE
        OBJECT_SCHEMA: test
          OBJECT_NAME: t1
          COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140334142005184
            LOCK_TYPE: SHARED_READ  -- 該MDL請求類型為共享讀
        LOCK_DURATION: TRANSACTION  -- 該MDL請求處于事務級别
          LOCK_STATUS: GRANTED  -- 該MDL請求已獲得
               SOURCE: sql_parse.cc:5929
      OWNER_THREAD_ID: 1892  -- 線程ID
       OWNER_EVENT_ID: 13              

已有各路大神對MDL做了詳細解析,是以本文不再贅述,我更喜歡寫點接地氣的。是以我們共同分析一個非常經典的MDL等待案例,借此分析遇到這種問題時,該怎麼分析和判斷,以及如何應對處理。(橫屏觀看)

session1 session2 session3

begin;

select * from t1 limit 1;

alter table t1 add cx int;

# 此時該請求被阻塞

這時執行

SHOW PROCESSLIST

的話,能看到下面這樣的現場(删除了部分無用輸出列)(橫屏觀看)

+------+------+---------------------------------+--------------------------+
| Id   | Time | State                           | Info                     |
+------+------+---------------------------------+--------------------------+
| 1853 |    0 | starting                        | show processlist         |
| 1854 |  134 | Waiting for table metadata lock | alter table t1 add cx int|
| 1855 |   83 | Waiting for table metadata lock | select * from t1 limit 1 |
+------+------+---------------------------------+--------------------------+      

如果隻看現場,我們是沒辦法分析出到底哪個線程導緻的MDL阻塞,這正是MySQL 5.7之前版本的痛苦之處,遇到MDL等待,隻能靠DBA的經驗、感覺去分析,還不一定完全可靠。

但是,5.7版本之後,我們就可以用PFS和sys schema進行跟蹤分析了:(橫屏觀看)

# 1、檢視目前的MDL請求狀态(删除部分資訊)
[[email protected]]> select * from performance_schema.metadata_locks;
+-------------+-----------+---------------------+---------------+-------------+------+
| OBJECT_TYPE | OBJECT_NAM| LOCK_TYPE           | LOCK_DURATION | LOCK_STATUS |  TID |
+-------------+-----------+---------------------+---------------+-------------+------+
| TABLE       | t1        | SHARED_READ         | TRANSACTION   | GRANTED     | 1892 |
| GLOBAL      | NULL      | INTENTION_EXCLUSIVE | STATEMENT     | GRANTED     | 1893 |
| SCHEMA      | NULL      | INTENTION_EXCLUSIVE | TRANSACTION   | GRANTED     | 1893 |
| TABLE       | t1        | SHARED_UPGRADABLE   | TRANSACTION   | GRANTED     | 1893 |
| BACKUP LOCK | NULL      | INTENTION_EXCLUSIVE | TRANSACTION   | GRANTED     | 1893 |
| TABLESPACE  | test/t1   | INTENTION_EXCLUSIVE | TRANSACTION   | GRANTED     | 1893 |
| TABLE       | #sql-e953_| EXCLUSIVE           | STATEMENT     | GRANTED     | 1893 |
| TABLE       | t1        | EXCLUSIVE           | TRANSACTION   | PENDING     | 1893 |
| TABLE       | t1        | SHARED_READ         | TRANSACTION   | PENDING     | 1894 |
+-------------+-----------+---------------------+---------------+-------------+------+

# 2、檢視目前的MDL等待狀态
[[email protected]]> select * from sys.schema_table_lock_waits\G
*************************** 1. row ***************************
               object_schema: test
                 object_name: t1
           waiting_thread_id: 1893  -- 等待的線程ID
                 waiting_pid: 1854  -- 等待的連接配接PID
             waiting_account: root@localhost
           waiting_lock_type: EXCLUSIVE  -- 等待的鎖類型
       waiting_lock_duration: TRANSACTION
               waiting_query: alter table t1 drop cx
          waiting_query_secs: 134  -- 鎖等待時長
 waiting_query_rows_affected: 0
 waiting_query_rows_examined: 0
          blocking_thread_id: 1892  -- 這裡的線程ID值對應上面的TID
                blocking_pid: 1853  -- 連接配接PID,對應下方的trx_mysql_thread_id
            blocking_account: root@localhost
          blocking_lock_type: SHARED_READ  -- 阻塞的鎖類型
      blocking_lock_duration: TRANSACTION
     sql_kill_blocking_query: KILL QUERY 1853
sql_kill_blocking_connection: KILL 1853
*************************** 2. row ***************************
               object_schema: test
                 object_name: t1
           waiting_thread_id: 1894
                 waiting_pid: 1855
             waiting_account: root@localhost
           waiting_lock_type: SHARED_READ
       waiting_lock_duration: TRANSACTION
               waiting_query: select * from t1 limit 1
          waiting_query_secs: 83
 waiting_query_rows_affected: 0
 waiting_query_rows_examined: 0
          blocking_thread_id: 1892
                blocking_pid: 1853
            blocking_account: root@localhost
          blocking_lock_type: SHARED_READ
      blocking_lock_duration: TRANSACTION
     sql_kill_blocking_query: KILL QUERY 1853
sql_kill_blocking_connection: KILL 1853
*************************** 3. row ***************************
               object_schema: test
                 object_name: t1
           waiting_thread_id: 1893
                 waiting_pid: 1854
             waiting_account: root@localhost
           waiting_lock_type: EXCLUSIVE
       waiting_lock_duration: TRANSACTION
               waiting_query: alter table t1 drop cx
          waiting_query_secs: 83
 waiting_query_rows_affected: 0
 waiting_query_rows_examined: 0
          blocking_thread_id: 1893
                blocking_pid: 1854
            blocking_account: root@localhost
          blocking_lock_type: SHARED_UPGRADABLE
      blocking_lock_duration: TRANSACTION
     sql_kill_blocking_query: KILL QUERY 1854
sql_kill_blocking_connection: KILL 1854
*************************** 4. row ***************************
               object_schema: test
                 object_name: t1
           waiting_thread_id: 1894
                 waiting_pid: 1855
             waiting_account: root@localhost
           waiting_lock_type: SHARED_READ
       waiting_lock_duration: TRANSACTION
               waiting_query: select * from t1 limit 1
          waiting_query_secs: 83
 waiting_query_rows_affected: 0
 waiting_query_rows_examined: 0
          blocking_thread_id: 1893
                blocking_pid: 1854
            blocking_account: root@localhost
          blocking_lock_type: SHARED_UPGRADABLE
      blocking_lock_duration: TRANSACTION
     sql_kill_blocking_query: KILL QUERY 1854
sql_kill_blocking_connection: KILL 1854

# 3、順便檢視目前是否有活躍未送出的事務,通常持續時間最久的事務也是引起大面積MDL鎖阻塞的根源
[[email protected]]> select trx_id,trx_state,time_to_sec(timediff(now(), trx_started))
    as trx_active_secs, trx_mysql_thread_id as pid, trx_query
    from information_schema.innodb_trx
    order by trx_active_secs desc \G
*************************** 1. row ***************************
         trx_id: 281479749621520
      trx_state: RUNNING
trx_active_secs: 226
            pid: 1853
      trx_query: select trx_id,trx_state...      

上述測試的環境是:

[[email protected]]>
...
Server version:     8.0.16 MySQL Community Server - GPL
...      

問題分析

從上面的結果可分析得到以下結論:

  1. 連接配接PID=1854的那個會話,被連接配接PID=1853的會話阻塞了
  2. 連接配接PID=1855的那個會話,被連接配接PID=1854的會話阻塞了
  3. 連接配接PID=1855被阻塞的源頭也可以追溯到PID=1853那個會話
  4. 也就是:session1阻塞session2,然後 session2阻塞session3

問題解決方法:

  1. 想要讓session2和3都不被阻塞,隻需要讓session1上持有的鎖釋放即可
  2. 雖然上面提示可以執行

    KILL QUERY 1853

    ,但實際上是不管用的,因為PID=1853中導緻MDL等待的SQL已經執行結束,隻是事務還沒送出而已,是以正确的方法是執行

    KILL 1853

    徹底殺掉該連接配接,復原相應的事務,釋放MDL鎖

最後多說一下,MDL是在MySQL server層的鎖,而InnoDB層也有表級别上的IS/IX鎖,二者并不是一回事。Enjoy MySQL :)