有時候需要索引很長的字元列,這會讓索引變得大且慢。通常可以索引開始的部分字元,這樣可以大大節約索引空間,進而提高索引效率。但這樣也會降低索引的選擇性。索引的選擇性是指不重複的索引值(也稱為基數,cardinality)和資料表的記錄總數的比值,範圍從1/T到1之間。索引的選擇性越高則查詢效率越高,因為選擇性高的索引可以讓MySQL在查找時過濾掉更多的行。唯一索引的選擇性是1,這是最好的索引選擇性,性能也是最好的。
一般情況下某個字首的選擇性也是足夠高的,足以滿足查詢性能。對于BLOB,TEXT,或者很長的VARCHAR類型的列,必須使用字首索引,因為MySQL不允許索引這些列的完整長度。訣竅在于要選擇足夠長的字首以保證較高的選擇性,同時又不能太長(以便節約空間)。字首應該足夠長,以使得字首索引的選擇性接近于索引的整個列。換句話說,字首的”基數“應該接近于完整的列的”基數“。
為了決定字首的合适長度,需要找到最常見的值的清單,然後和最常見的字首清單進行比較。下面的示例是mysql官方提供的示例資料庫,下載下傳位址如下:
http://downloads.mysql.com/docs/sakila-db.zip
在示例資料庫sakila中并沒有合适的例子,是以從表city中生成一個示例表,這樣就有足夠資料進行示範:
1.解壓下載下傳的sakila-db.zip檔案
2.使用source指令以及sakila-schema.sql和sakila-data.sql檔案來初始化sakila庫以及相關表格
mysql> select database();
+------------+
| database() |
+------------+
| sakila |
+------------+
1 row in set (0.00 sec)
mysql> create table city_demo (city varchar(50) not null);
mysql> insert into city_demo (city) select city from city; --執行兩次
Query OK, 600 rows affected (0.02 sec)
Records: 600 Duplicates: 0 Warnings: 0
mysql> update city_demo set city = ( select city from city order by rand() limit 1);
Query OK, 1198 rows affected (0.42 sec)
Rows matched: 1200 Changed: 1198 Warnings: 0
注:因為上述sql語句使用了rand函數,是以每個人的執行結果可以都不一樣!
首先找到最常見的城市清單:
mysql> select count(*) as cnt,city from city_demo group by city order by cnt desc limit 10;
+-----+---------------+
| cnt | city |
+-----+---------------+
| 8 | Dongying |
| 7 | Omdurman |
| 6 | Etawah |
| 6 | Okara |
| 6 | Tsuyama |
| 6 | Brindisi |
| 6 | Kuwana |
| 6 | Grand Prairie |
| 5 | Fuyu |
| 5 | Siegen |
+-----+---------------+
10 rows in set (0.00 sec)
現在查找到頻繁出現的城市字首。先從3個字首字母開始,然後4個,5個,6個:
mysql> select count(*) as cnt,left(city,3) as pref from city_demo group by pref order by cnt desc limit 10;
+-----+------+
| cnt | pref |
+-----+------+
| 23 | San |
| 15 | Hal |
| 14 | Cha |
| 14 | al- |
| 12 | Bat |
| 12 | Kor |
| 11 | Don |
| 11 | Shi |
| 10 | La |
| 10 | El |
+-----+------+
10 rows in set (0.00 sec)
可以看到3位元組檢索到的結果與全文檢索相差很大,繼續增加到4個位元組
mysql> select count(*) as cnt,left(city,4) as pref from city_demo group by pref order by cnt desc limit 10;
+-----+------+
| cnt | pref |
+-----+------+
| 14 | San |
| 8 | Dong |
| 7 | Iwak |
| 7 | al-Q |
| 7 | Omdu |
| 6 | Kuwa |
| 6 | Tsuy |
| 6 | Brin |
| 6 | Etaw |
| 6 | Okar |
+-----+------+
10 rows in set (0.00 sec)
mysql> select count(*) as cnt,left(city,5) as pref from city_demo group by pref order by cnt desc limit 10;
+-----+-------+
| cnt | pref |
+-----+-------+
| 8 | Dongy |
| 7 | al-Qa |
| 7 | Omdur |
| 6 | Okara |
| 6 | Valle |
| 6 | Grand |
| 6 | Tsuya |
| 6 | Etawa |
| 6 | South |
| 6 | Kuwan |
+-----+-------+
10 rows in set (0.00 sec)
mysql> select count(*) as cnt,left(city,6) as pref from city_demo group by pref order by cnt desc limit 10;
+-----+--------+
| cnt | pref |
+-----+--------+
| 8 | Dongyi |
| 7 | Omdurm |
| 6 | Okara |
| 6 | Tsuyam |
| 6 | Valle |
| 6 | Grand |
| 6 | Etawah |
| 6 | Brindi |
| 6 | Kuwana |
| 5 | Haldia |
+-----+--------+
10 rows in set (0.01 sec)
通過上面改變不同字首長度發現,當字首長度為6時,這個字首的選擇性就接近完整列的選擇性了。
當然還有另外更友善的方法,那就是計算完整列的選擇性,并使其字首的選擇性接近于完整列的選擇性。下面顯示如何計算完整列的選擇性:
mysql> select count(distinct city)/count(*) from city_demo;
+-------------------------------+
| count(distinct city)/count(*) |
+-------------------------------+
| 0.4333 |
+-------------------------------+
1 row in set (0.00 sec)
可以在一個查詢中針對不同字首長度的選擇性進行計算,這對于大表非常有用,下面給出如何在同一個查詢中計算不同字首長度的選擇性:
mysql> select count(distinct left(city,3))/count(*) as sel3,count(distinct left(city,4))/count(*) as sel4,count(distinct left(city,5))/count(*) as sel5, count(distinct left(city,6))/count(*) as sel6 from city_demo;
+--------+--------+--------+--------+
| sel3 | sel4 | sel5 | sel6 |
+--------+--------+--------+--------+
| 0.3408 | 0.4100 | 0.4225 | 0.4300 |
+--------+--------+--------+--------+
1 row in set (0.00 sec)
可以看見當索引字首為6時的基數是0.4300,已經接近完整列選擇性0.4333。
下面根據找到的索引字首長度建立字首索引:
mysql> alter table city_demo add key (city(6));
Query OK, 0 rows affected (0.19 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> explain select * from city_demo where city like 'Jin%';
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: city_demo
partitions: NULL
type: range
possible_keys: city
key: city
key_len: 8
ref: NULL
rows: 4
filtered: 100.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
可以看見正确使用剛建立的索引。
優點:字首索引是一種能使索引更小,更快的有效辦法
缺點:mysql無法使用其字首索引做ORDER BY和GROUP BY,也無法使用字首索引做覆寫掃描。