解密MySQL 8.0 multi-valued indexes

什麼是multi-valued index

MySQL 8.0.17起，InnoDB引擎新增了對JSON資料類型的多值索引，即multi-valued index。它的作用是針對JSON資料類型中，同一條記錄有多個值的情況，加上索引後，根據這些值條件查詢時，也可以指向同一條資料。

假設有一條資料是

{"user":"Bob","zipcode":[94477,94536]}

，意為Bob這位使用者，他擁有多個郵編"94477"和"94536"，這時候如果我們想對zipcode屬性加索引，就可以選擇使用多值索引了，在以往是不支援這個方式的。可以像下面這樣建立索引：（建議在PC端或橫版觀看，下同）

[[email protected]]> CREATE INDEX zips ON t1((
CAST(data->'$.zipcode' AS UNSIGNED ARRAY)));

在本例中的多值索引實際上是采用基于CAST()的函數索引，CAST()轉換後選擇的資料類型除了BINARY和JSON，其他都可以支援。目前multi-valued index隻針對InnoDB表中的JSON資料類型，其餘場景還不支援。

multi-valued index怎麼用

我們來看下一個JSON列怎麼建立multi-valued index。

# 建立測試表
[[email protected]]> CREATE TABLE customers (
 id INT NOT NULL AUTO_INCREMENT,
 custinfo JSON,
 primary key(id)
)engine=innodb;

# 寫入5條測試資料
[[email protected]]> INSERT INTO customers(custinfo) VALUES
('{"user":"Jack","user_id":37,"zipcode":[94582,94536]}'),
('{"user":"Jill","user_id":22,"zipcode":[94568,94507,94582]}'),
('{"user":"Bob","user_id":31,"zipcode":[94477,94507]}'),
('{"user":"Mary","user_id":72,"zipcode":[94536]}'),
('{"user":"Ted","user_id":56,"zipcode":[94507,94582]}');

# 執行查詢，此時還沒建立索引，需要全表掃描
[[email protected]]> DESC SELECT * FROM customers WHERE
JSON_CONTAINS(custinfo->'$.zipcode',
CAST('[94507,94582]' AS JSON))\G
****************** 1. row ******************
...
         type: ALL
possible_keys: NULL
          key: NULL
...
         rows: 5
     filtered: 100.00
        Extra: Using where

# 建立multi-valued index
[[email protected]]> ALTER TABLE customers ADD INDEX
zips((CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)));

# 檢視新的執行計劃，可以走索引
[[email protected]]> DESC SELECT * FROM customers WHERE
JSON_CONTAINS(custinfo->'$.zipcode',
CAST('[94507,94582]' AS JSON))\G
****************** 1. row ******************
...
         type: range
possible_keys: zips
          key: zips
      key_len: 9
          ref: NULL
         rows: 6
     filtered: 100.00
        Extra: Using where; Using MRR

multi-valued index底層是怎麼存儲的

知道multi-valued index怎麼用之後，再來看下它底層是怎麼存儲索引資料的。以上面的customers表為例，我們利用innblock和bcview工具來确認InnoDB底層是怎麼存儲的。

1. 先找到輔助索引page

先用innblock工具确認輔助索引zips在哪個page上。

[[email protected]]# innblock customers.ibd scan 16
...
===INDEX_ID:56555
level0 total block is (1)
block_no:         4,level:   0|*|
===INDEX_ID:56556
level0 total block is (1)
block_no:         5,level:   0|*|

由于資料量很小，這兩個索引都隻需要一個page就能放下，輔助索引keys存儲在5号page上。

2. 掃描确認輔助索引資料

繼續用innblock掃描輔助索引，确認有多少條資料。

[[email protected]]# innblock customers.ibd 5 16
...
-----Total used rows:12 used rows list(logic):
(1) INFIMUM record offset:99 heapno:0 n_owned 1,delflag:N minflag:0 rectype:2
(2) normal record offset:216 heapno:7 n_owned 0,delflag:N minflag:0 rectype:0
(3) normal record offset:162 heapno:4 n_owned 0,delflag:N minflag:0 rectype:0
(4) normal record offset:234 heapno:8 n_owned 0,delflag:N minflag:0 rectype:0
(5) normal record offset:270 heapno:10 n_owned 0,delflag:N minflag:0 rectype:0
(6) normal record offset:126 heapno:2 n_owned 5,delflag:N minflag:0 rectype:0
(7) normal record offset:252 heapno:9 n_owned 0,delflag:N minflag:0 rectype:0
(8) normal record offset:180 heapno:5 n_owned 0,delflag:N minflag:0 rectype:0
(9) normal record offset:144 heapno:3 n_owned 0,delflag:N minflag:0 rectype:0
(10) normal record offset:198 heapno:6 n_owned 0,delflag:N minflag:0 rectype:0
(11) normal record offset:288 heapno:11 n_owned 0,delflag:N minflag:0 rectype:0
(12) SUPREMUM record offset:112 heapno:1 n_owned 6,delflag:N minflag:0 rectype:3
-----Total used rows:12 used rows list(phy):
(1) INFIMUM record offset:99 heapno:0 n_owned 1,delflag:N minflag:0 rectype:2
(2) SUPREMUM record offset:112 heapno:1 n_owned 6,delflag:N minflag:0 rectype:3
(3) normal record offset:126 heapno:2 n_owned 5,delflag:N minflag:0 rectype:0
(4) normal record offset:144 heapno:3 n_owned 0,delflag:N minflag:0 rectype:0
(5) normal record offset:162 heapno:4 n_owned 0,delflag:N minflag:0 rectype:0
(6) normal record offset:180 heapno:5 n_owned 0,delflag:N minflag:0 rectype:0
(7) normal record offset:198 heapno:6 n_owned 0,delflag:N minflag:0 rectype:0
(8) normal record offset:216 heapno:7 n_owned 0,delflag:N minflag:0 rectype:0
(9) normal record offset:234 heapno:8 n_owned 0,delflag:N minflag:0 rectype:0
(10) normal record offset:252 heapno:9 n_owned 0,delflag:N minflag:0 rectype:0
(11) normal record offset:270 heapno:10 n_owned 0,delflag:N minflag:0 rectype:0
(12) normal record offset:288 heapno:11 n_owned 0,delflag:N minflag:0 rectype:0
...

可以看到，總共有12條記錄，除去INFIMUM、SUPREMUM這兩條虛拟記錄，共有10條實體記錄。為什麼是10條記錄，而不是5條記錄呢，這是因為multi-valued index實際上是把每個zipcode value對都視為一天索引記錄。再看一眼表資料：

[[email protected]]> select id, custinfo->'$.zipcode' from customers;
+----+-----------------------+
| id | custinfo->'$.zipcode' |
+----+-----------------------+
|  1 | [94582, 94536]        |
|  2 | [94568, 94507, 94582] |
|  3 | [94477, 94507]        |
|  4 | [94536]               |
|  5 | [94507, 94582]        |
+----+-----------------------+

上面寫入的5條資料中，共有10個zipcode，雖然有些zipcode是相同的，但他們對應的id值不同，是以也要分别記錄索引。也就是說，

"zipcode":[94582,94536]

這裡的兩個整型資料，實際上在索引樹中，是兩條獨立的資料，隻不過他們都分别指向id=1這條資料。那麼，這個索引實際上存儲的順序就應該是下面這樣才對：

+---------+------+
| zipcode | id   |
+---------+------+
|   94477 |    3 |
|   94507 |    2 |
|   94507 |    3 |
|   94507 |    5 |
|   94536 |    1 |
|   94536 |    4 |
|   94568 |    2 |
|   94582 |    1 |
|   94582 |    2 |
|   94582 |    5 |
+---------+------+

提醒下，由于InnoDB的index extensions特性，輔助索引存儲時總是包含聚集索引列值，若有兩個值相同的輔助索引值，則會根據其聚集索引列值進行排序。當然了，以上也隻是我們的推測，并不能實錘，直接去核對源碼好像有點難度。好在可以用另一個神器bcview來檢視底層資料。這裡之是以沒有采用innodb_space工具，是因為它對MySQL 5.7以上的版本相容性不夠好，有些場景下解析出來的可能是錯誤資料。

3. 用bcview工具确認結論

按照推測，zips這個索引按照邏輯順序的話，第一條索引記錄是

[94477,3]

才對，上面看到第一條邏輯記錄的偏移量是216，我們來看下。

# 從上面掃描結果可知，一條記錄總消耗存儲空間是18位元組
bcview customers.ibd 16 216 18
...
# 這裡為了排版友善，我給人為折行了
current block:00000005 --對應的pageno=5
--Offset:00216 --偏移量216
--cnt bytes:18 --讀取18位元組
--data is:000000000001710d80000003000000400024

來分析下這條資料，要拆分成幾段來看。

000000000001710d，8位元組（BIGINT），十六進制轉成十進制，就是 94477
80000003，4位元組（INT），對應十進制3，也就是id=3
000000400024，record headder，6位元組，忽略

這表明推測結果是正确的。

另外，如果按照實體寫入順序，則第一條資料id=1這條資料：

+----+-----------------------+
| id | custinfo->'$.zipcode' |
+----+-----------------------+
|  1 | [94582, 94536]        |
+----+-----------------------+

這條實體記錄，共産生兩條輔助索引記錄，我們一次性掃描出來（36位元組）：

bcview customers.ibd 16 126 36
...
current block:00000005
--Offset:00126
--cnt bytes:36
--data is:000000000001714880000001000000180036000000000001717680000001000000200048
...

同上，解析結果見下（存儲順序要反着看）：

0000000000017148 => 94536
80000001 => id=1
000000180036
0000000000017176 => 94582
80000001 => id=1
000000200048

可以看到，确實是把JSON裡的多個值拆開來，對應到聚集索引後存儲每個鍵值。至此，我們完全搞清楚了multi-valued index的底層存儲結構。

解密MySQL 8.0 multi-valued indexes

什麼是multi-valued index

multi-valued index怎麼用

multi-valued index底層是怎麼存儲的

1. 先找到輔助索引page

2. 掃描确認輔助索引資料

3. 用bcview工具确認結論

繼續閱讀

underscore模闆功能的使用和學習

寶塔面闆mysql恢複2018.1.8更新

Centos7 MySQL 5.7 安裝MySQL 5.7 安裝

查找入職員工時間排名倒數第三的員工所有資訊

Hibernate使用Hibernate的“3個準備，7個步驟”Hibernate API簡介操作實體對象對象識别

雲計算面試題——mysql/存儲引擎/備份

SQL語言基礎：常用的資料查詢語句

[HTML5]自定義屬性 data-* 和 jQuery.data 詳解

七牛雲-C#SDK-上傳-前期準備

Ubuntu16.04安裝Apache+MySQL+PHP1. 安裝Apache2. 安裝MySQL3. 安裝PHP4. 安裝phpMyAdmin

MySQL的4種隔離級别？出現問題

neo4j之cypher使用文檔

mysql使用source指令導入.sql檔案

vue-cli簡介（中文翻譯）

Ajax發送和擷取json資料到Spring mvc 1.spring mvc後端2.web前段

JSONObject包導入異常 java.lang.NoClassDefFoundErrorweb項目的導入包的問題