title: mysql頁面crash問題複現&恢複方法
date: 2019-07-25 11:33:06
categories: Mysql
看到一個MYSQL單節點CRASH的案例,這裡用MYSQL5.7嘗試複現一個頁面損壞的場景,并嘗試恢複資料!
構造資料
sysbench構造10測試表
/home/mingjie.gmj/bin/sysbench-1.0.16/bin/sysbench oltp_common --threads=64 --events=0 --mysql-socket=/home/mingjie.gmj/databases/data/mydata5405/mysql5405.sock --mysql-user=root --tables=10 --table_size=1000 prepare
sysbench 1.0.16 (using bundled LuaJIT 2.1.0-beta2)
Initializing worker threads...
Creating table 'sbtest4'...
Creating table 'sbtest8'...
Creating table 'sbtest6'...
Creating table 'sbtest2'...
Creating table 'sbtest10'...
Creating table 'sbtest7'...
Creating table 'sbtest1'...
Creating table 'sbtest5'...
Creating table 'sbtest9'...
Creating table 'sbtest3'...
Inserting 1000 records into 'sbtest10'
Inserting 1000 records into 'sbtest7'
Inserting 1000 records into 'sbtest1'
Inserting 1000 records into 'sbtest2'
Inserting 1000 records into 'sbtest4'
Inserting 1000 records into 'sbtest6'
Inserting 1000 records into 'sbtest3'
Inserting 1000 records into 'sbtest9'
Inserting 1000 records into 'sbtest8'
Inserting 1000 records into 'sbtest5'
Creating a secondary index on 'sbtest10'...
Creating a secondary index on 'sbtest6'...
Creating a secondary index on 'sbtest1'...
Creating a secondary index on 'sbtest5'...
Creating a secondary index on 'sbtest7'...
Creating a secondary index on 'sbtest3'...
Creating a secondary index on 'sbtest2'...
Creating a secondary index on 'sbtest9'...
Creating a secondary index on 'sbtest8'...
Creating a secondary index on 'sbtest4'...
檢視表資料
mysql> show tables;
+------------------+
| Tables_in_sbtest |
+------------------+
| sbtest1 |
| sbtest10 |
| sbtest2 |
| sbtest3 |
| sbtest4 |
| sbtest5 |
| sbtest6 |
| sbtest7 |
| sbtest8 |
| sbtest9 |
+------------------+
10 rows in set (0.00 sec)
mysql> select * from sbtest9 limit 1\G
*************************** 1. row ***************************
id: 1
k: 505
c: 92419600644-86829681637-42100581414-80298414140-81768158898-74430369956-50895721992-62087272403-75473465539-28369755814
pad: 65092491791-76928308446-68130154933-07155890946-00453047346
1 row in set (0.00 sec)
手動破壞表檔案
對表
sbtest9
,模拟頁面損壞,首先檢視資料檔案位置
mysql> show global variables like '%datadir%';
+---------------+----------------------------------------------+
| Variable_name | Value |
+---------------+----------------------------------------------+
| datadir | /home/mingjie.gmj/databases/data/mydata5405/ |
+---------------+----------------------------------------------+
1 row in set (0.01 sec)
vim直接删除前十個字元!
vi sbtest9.ibd
删除後查詢還可以成功(表檔案删了也能查)因為現在查的還是緩存。
重新開機資料庫觸發問題
正常重新開機
/home/mingjie.gmj/databases/mysql5405/bin/mysqladmin -S /home/mingjie.gmj/databases/data/mydata5405/mysql5405.sock -uroot -p shutdown
/bin/sh /home/mingjie.gmj/databases/mysql5405/bin/mysqld_safe --defaults-file=/home/mingjie.gmj/databases/data/mydata5405/my.cnf &
切換資料庫時,資料庫crash!
mysql> use sbtest
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> 190725 14:32:01 mysqld_safe Number of processes running now: 0
190725 14:32:01 mysqld_safe mysqld restarted
190725 14:32:02 mysqld_safe mysqld from pid file /home/mingjie.gmj/databases/data/mydata5405/iZbp1d4tisi44j6vxze02fZ.pid ended
mysql> use sbtest
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/home/mingjie.gmj/databases/data/mydata5405/mysql5405.sock' (2)
ERROR:
Can't connect to the server
先看下重新開機啟動時的日志,已經有一些問題了
提示pageheader錯誤(因為我前面删的是檔案頭部的資料)
2019-07-25 14:32:02 6933 [Note] InnoDB: Database was not shutdown normally!
2019-07-25 14:32:02 6933 [Note] InnoDB: Starting crash recovery.
2019-07-25 14:32:02 6933 [Note] InnoDB: Reading tablespace information from the .ibd files...
2019-07-25 14:32:02 6933 [ERROR] InnoDB: Space id in fsp header 1441792,but in the page header 0
2019-07-25 14:32:02 6933 [ERROR] InnoDB: innodb-page-size mismatch in tablespace ./sbtest/sbtest9.ibd (table sbtest/sbtest9)
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size:1024 Pages to analyze:64
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size: 1024, Possible space_id count:0
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size:2048 Pages to analyze:64
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size: 2048, Possible space_id count:0
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size:4096 Pages to analyze:64
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size: 4096, Possible space_id count:0
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size:8192 Pages to analyze:43
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size: 8192, Possible space_id count:0
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size:16384 Pages to analyze:21
2019-07-25 14:32:02 6933 [Note] InnoDB: Page size: 16384, Possible space_id count:0
2019-07-25 14:32:02 7f4e65962740 InnoDB: Operating system error number 2 in a file operation.
在看一下crash的報錯
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
InnoDB: Error: could not open single-table tablespace file ./sbtest/sbtest9.ibd
InnoDB: We do not continue the crash recovery, because the table may become
InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.
InnoDB: To fix the problem and start mysqld:
InnoDB: 1) If there is a permission problem in the file and mysqld cannot
InnoDB: open the file, you should modify the permissions.
InnoDB: 2) If the table is not needed, or you can restore it from a backup,
InnoDB: then you can remove the .ibd file, and InnoDB will do a normal
InnoDB: crash recovery and ignore that table.
InnoDB: 3) If the file system or the disk is broken, and you cannot remove
InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
InnoDB: and force InnoDB to continue crash recovery here.
190725 14:32:02 mysqld_safe mysqld from pid file /home/mingjie.gmj/databases/data/mydata5405/iZbp1d4tisi44j6vxze02fZ.pid ended
注意看下3),已經建議怎麼操作了!
InnoDB: 3) If the file system or the disk is broken, and you cannot remove
InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
InnoDB: and force InnoDB to continue crash recovery here.
開始修複
嘗試再次重新開機資料庫報一樣的錯誤
現在資料庫已經不可服務了!需要盡快恢複!
增加參數force拉起資料庫
innodb_force_recovery = 1
2019-07-25 14:42:33 8563 [Note] InnoDB: innodb_force_recovery was set to 1. Continuing crash recovery even though we cannot access the .ibd file of this table.
2019-07-25 14:42:33 8563 [Note] InnoDB: Restoring possible half-written data pages
2019-07-25 14:42:33 8563 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Last MySQL binlog file position 0 1914387, file name mysql-bin.000013
2019-07-25 14:42:33 8563 [ERROR] InnoDB: Table sbtest/sbtest9 in the InnoDB data dictionary has tablespace id 43, but tablespace with that id or name does not exist. Have you deleted or moved .ibd files? This may also be a table created with CREATE TEMPORARY TABLE whose .ibd and .frm files MySQL automatically removed, but the table still exists in the InnoDB internal data dictionary.
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html
InnoDB: for how to resolve the issue.
2019-07-25 14:42:33 8563 [Note] InnoDB: 128 rollback segment(s) are active.
2019-07-25 14:42:33 8563 [Note] InnoDB: Waiting for purge to start
2019-07-25 14:42:33 8563 [Note] InnoDB: 5.6.44 started; log sequence number 4247920472
2019-07-25 14:42:33 8563 [Note] InnoDB: !!! innodb_force_recovery is set to 1 !!!
2019-07-25 14:42:33 8563 [Note] Recovering after a crash using /home/mingjie.gmj/databases/data/mydata5405/mysql-bin
2019-07-25 14:42:33 8563 [Note] Starting crash recovery...
2019-07-25 14:42:33 8563 [Note] Crash recovery finished.
2019-07-25 14:42:33 8563 [Note] Server hostname (bind-address): '*'; port: 5405
2019-07-25 14:42:33 8563 [Note] IPv6 is available.
2019-07-25 14:42:33 8563 [Note] - '::' resolves to '::';
2019-07-25 14:42:33 8563 [Note] Server socket created on IP: '::'.
InnoDB: innodb_force_recovery is on: we do not allow
InnoDB: database modifications by the user. Shut down
InnoDB: mysqld and edit my.cnf so thatInnoDB: innodb_force_... is removed.
2019-07-25 14:42:33 8563 [ERROR] Error writing master configuration.
2019-07-25 14:42:33 8563 [ERROR] Error reading master configuration.
InnoDB: innodb_force_recovery is on: we do not allow
InnoDB: database modifications by the user. Shut down
InnoDB: mysqld and edit my.cnf so thatInnoDB: innodb_force_... is removed.
2019-07-25 14:42:33 8563 [ERROR] Error writing relay log configuration.
2019-07-25 14:42:33 8563 [ERROR] Error reading relay log configuration.
2019-07-25 14:42:33 8563 [ERROR] Failed to initialize the master info structure
2019-07-25 14:42:33 8563 [Note] Check error log for additional messages. You will not be able to start replication until the issue is resolved and the server restarted.
2019-07-25 14:42:33 8563 [Note] Event Scheduler: Loaded 0 events
2019-07-25 14:42:33 8563 [Note] /home/mingjie.gmj/databases/mysql5405/bin/mysqld: ready for connections.
Version: '5.6.44-log' socket: '/home/mingjie.gmj/databases/data/mydata5405/mysql5405.sock' port: 5405 Source distribution
連上去嘗試修複資料
mysql> create table sbtest9_recovery like sbtest9;
ERROR 1146 (42S02): Table 'sbtest.sbtest9' doesn't exist
檔案頭被損壞了,資料庫已經讀不到這個表了!隻能幹掉這個表了
mysql> drop table sbtest9;
Query OK, 0 rows affected (0.01 sec)
注釋參數重新開機後恢複正常,但是資料已經無法恢複了!
啟動起來之後,需要嘗試查詢有問題的表,如果隻是某個頁面損壞了應該大部分資料是可以查出來的,可以用這樣的SQL來試
insert ignore into test_recovery select * from test limit 10;
insert ignore into test_recovery select * from test limit 20;
insert ignore into test_recovery select * from test limit 30;
頁面crash可以出現各種個樣的情況,我在測試的過程中出現了很多種情況,有的情況改參數也拉不起來資料庫,還有COREDUMP等等,具體問題需要具體分析!
附:參數說明
innodb_force_recovery
1 (SRV_FORCE_IGNORE_CORRUPT): 忽略檢查到的 corrupt 頁。盡管檢測到了損壞的 page 仍強制服務運作。一般設定為該值即可,然後 dump 出庫表進行重建。
2 (SRV_FORCE_NO_BACKGROUND): 阻止主線程的運作,如主線程需要執行 full purge 操作,會導緻 crash。 阻止 master thread 和任何 purge thread 運作。若 crash 發生在 purge 環節則使用該值。
3 (SRV_FORCE_NO_TRX_UNDO): 不執行事務復原操作。
4 (SRV_FORCE_NO_IBUF_MERGE): 不執行插入緩沖的合并操作。如果可能導緻崩潰則不要做這些操作。不要進行統計操作。該值可能永久損壞資料檔案。若使用了該值,則将來要删除和重建輔助索引。
5 (SRV_FORCE_NO_UNDO_LOG_SCAN): 不檢視重做日志,InnoDB 存儲引擎會将未送出的事務視為已送出。此時 InnoDB 甚至把未完成的事務按照送出處理。該值可能永久性的損壞資料檔案。
6 (SRV_FORCE_NO_LOG_REDO): 不執行前滾的操作。恢複時不做 redo log roll-forward。使資料庫頁處于廢止狀态,繼而可能引起 B 樹或者其他資料庫結構更多的損壞。
官方文檔:
Forcing InnoDB Recovery-
(1
SRV_FORCE_IGNORE_CORRUPT
)
Lets the server run even if it detects a corrupt
page . Tries to make
jump over corrupt index records and pages, which helps in dumping tables.SELECT * FROM *tbl_name*
-
2
Prevents the master thread and any purge threads from running. If a crash would occur during the purge operation, this recovery value prevents it.SRV_FORCE_NO_BACKGROUND
-
3
Does not run transaction rollbacks after crash recovery .SRV_FORCE_NO_TRX_UNDO
-
4
Prevents insert buffer merge operations. If they would cause a crash, does not do them. Does not calculate table statistics . This value can permanently corrupt data files. After using this value, be prepared to drop and recreate all secondary indexes. SetsSRV_FORCE_NO_IBUF_MERGE
to read-only.InnoDB
-
5
Does not look at undo logs when starting the database:SRV_FORCE_NO_UNDO_LOG_SCAN
treats even incomplete transactions as committed. This value can permanently corrupt data files. SetsInnoDB
InnoDB
-
6
Does not do the redo log roll-forward in connection with recovery. This value can permanently corrupt data files. Leaves database pages in an obsolete state, which in turn may introduce more corruption into B-trees and other database structures. SetsSRV_FORCE_NO_LOG_REDO
InnoDB