mysql索引之字首索引

有時候需要索引很長的字元列，這會讓索引變得大且慢。通常可以索引開始的部分字元，這樣可以大大節約索引空間，進而提高索引效率。但這樣也會降低索引的選擇性。索引的選擇性是指不重複的索引值（也稱為基數，cardinality)和資料表的記錄總數的比值，範圍從1/T到1之間。索引的選擇性越高則查詢效率越高，因為選擇性高的索引可以讓MySQL在查找時過濾掉更多的行。唯一索引的選擇性是1，這是最好的索引選擇性，性能也是最好的。

一般情況下某個字首的選擇性也是足夠高的，足以滿足查詢性能。對于BLOB，TEXT，或者很長的VARCHAR類型的列，必須使用字首索引，因為MySQL不允許索引這些列的完整長度。訣竅在于要選擇足夠長的字首以保證較高的選擇性，同時又不能太長（以便節約空間）。字首應該足夠長，以使得字首索引的選擇性接近于索引的整個列。換句話說，字首的”基數“應該接近于完整的列的”基數“。

為了決定字首的合适長度，需要找到最常見的值的清單，然後和最常見的字首清單進行比較。下面的示例是mysql官方提供的示例資料庫，下載下傳位址如下：

http://downloads.mysql.com/docs/sakila-db.zip

在示例資料庫sakila中并沒有合适的例子，是以從表city中生成一個示例表，這樣就有足夠資料進行示範：

1.解壓下載下傳的sakila-db.zip檔案

2.使用source指令以及sakila-schema.sql和sakila-data.sql檔案來初始化sakila庫以及相關表格

mysql> select database();

+------------+

| database() |

+------------+

| sakila |

+------------+

1 row in set (0.00 sec)

mysql> create table city_demo (city varchar(50) not null);

mysql> insert into city_demo (city) select city from city; --執行兩次

Query OK, 600 rows affected (0.02 sec)

Records: 600 Duplicates: 0 Warnings: 0

mysql> update city_demo set city = ( select city from city order by rand() limit 1);

Query OK, 1198 rows affected (0.42 sec)

Rows matched: 1200 Changed: 1198 Warnings: 0

注：因為上述sql語句使用了rand函數，是以每個人的執行結果可以都不一樣！

首先找到最常見的城市清單：

mysql> select count(*) as cnt,city from city_demo group by city order by cnt desc limit 10;

+-----+---------------+

| cnt | city |

+-----+---------------+

| 8 | Dongying |

| 7 | Omdurman |

| 6 | Etawah |

| 6 | Okara |

| 6 | Tsuyama |

| 6 | Brindisi |

| 6 | Kuwana |

| 6 | Grand Prairie |

| 5 | Fuyu |

| 5 | Siegen |

+-----+---------------+

10 rows in set (0.00 sec)

現在查找到頻繁出現的城市字首。先從3個字首字母開始，然後4個，5個，6個：

mysql> select count(*) as cnt,left(city,3) as pref from city_demo group by pref order by cnt desc limit 10;

+-----+------+

| cnt | pref |

+-----+------+

| 23 | San |

| 15 | Hal |

| 14 | Cha |

| 14 | al- |

| 12 | Bat |

| 12 | Kor |

| 11 | Don |

| 11 | Shi |

| 10 | La |

| 10 | El |

+-----+------+

10 rows in set (0.00 sec)

可以看到3位元組檢索到的結果與全文檢索相差很大，繼續增加到4個位元組

mysql> select count(*) as cnt,left(city,4) as pref from city_demo group by pref order by cnt desc limit 10;

+-----+------+

| cnt | pref |

+-----+------+

| 14 | San |

| 8 | Dong |

| 7 | Iwak |

| 7 | al-Q |

| 7 | Omdu |

| 6 | Kuwa |

| 6 | Tsuy |

| 6 | Brin |

| 6 | Etaw |

| 6 | Okar |

+-----+------+

10 rows in set (0.00 sec)

mysql> select count(*) as cnt,left(city,5) as pref from city_demo group by pref order by cnt desc limit 10;

+-----+-------+

| cnt | pref |

+-----+-------+

| 8 | Dongy |

| 7 | al-Qa |

| 7 | Omdur |

| 6 | Okara |

| 6 | Valle |

| 6 | Grand |

| 6 | Tsuya |

| 6 | Etawa |

| 6 | South |

| 6 | Kuwan |

+-----+-------+

10 rows in set (0.00 sec)

mysql> select count(*) as cnt,left(city,6) as pref from city_demo group by pref order by cnt desc limit 10;

+-----+--------+

| cnt | pref |

+-----+--------+

| 8 | Dongyi |

| 7 | Omdurm |

| 6 | Okara |

| 6 | Tsuyam |

| 6 | Valle |

| 6 | Grand |

| 6 | Etawah |

| 6 | Brindi |

| 6 | Kuwana |

| 5 | Haldia |

+-----+--------+

10 rows in set (0.01 sec)

通過上面改變不同字首長度發現，當字首長度為6時，這個字首的選擇性就接近完整列的選擇性了。

當然還有另外更友善的方法，那就是計算完整列的選擇性，并使其字首的選擇性接近于完整列的選擇性。下面顯示如何計算完整列的選擇性：

mysql> select count(distinct city)/count(*) from city_demo;

+-------------------------------+

| count(distinct city)/count(*) |

+-------------------------------+

| 0.4333 |

+-------------------------------+

1 row in set (0.00 sec)

可以在一個查詢中針對不同字首長度的選擇性進行計算，這對于大表非常有用，下面給出如何在同一個查詢中計算不同字首長度的選擇性：

mysql> select count(distinct left(city,3))/count(*) as sel3,count(distinct left(city,4))/count(*) as sel4,count(distinct left(city,5))/count(*) as sel5, count(distinct left(city,6))/count(*) as sel6 from city_demo;

+--------+--------+--------+--------+

| sel3 | sel4 | sel5 | sel6 |

+--------+--------+--------+--------+

| 0.3408 | 0.4100 | 0.4225 | 0.4300 |

+--------+--------+--------+--------+

1 row in set (0.00 sec)

可以看見當索引字首為6時的基數是0.4300，已經接近完整列選擇性0.4333。

下面根據找到的索引字首長度建立字首索引：

mysql> alter table city_demo add key (city(6));

Query OK, 0 rows affected (0.19 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql> explain select * from city_demo where city like 'Jin%';

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: city_demo

partitions: NULL

type: range

possible_keys: city

key: city

key_len: 8

ref: NULL

rows: 4

filtered: 100.00

Extra: Using where

1 row in set, 1 warning (0.00 sec)

可以看見正确使用剛建立的索引。

優點：字首索引是一種能使索引更小，更快的有效辦法

缺點：mysql無法使用其字首索引做ORDER BY和GROUP BY，也無法使用字首索引做覆寫掃描。