天天看點

DISTINCT和GROUP BY的差別

有時候進行分組查詢的時候 可以用DISTINCT和GROUP BY 、是以難免需要比較一下、

搜了一下 感覺這個比較權威 記一下

http://yangtingkun.itpub.net/post/468/227628

-------------------------------------------------------------------------------------------

其實二者沒有什麼可比性,但是對于不包含聚集函數的GROUP BY操作來說,和DISTINCT操作是等價的。不過雖然二者的結果是一樣的,但是二者的執行計劃并不相同。

在Oracle9i中:

SQL> SELECT * FROM V$VERSION;

BANNER

----------------------------------------------------------------

Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production PL/SQL Release 9.2.0.4.0 - Production

CORE 9.2.0.3.0 Production

TNS for Linux: Version 9.2.0.4.0 - Production

NLSRTL Version 9.2.0.4.0 - Production

SQL> CREATE TABLE T AS SELECT ROWNUM ID, A.* FROM DBA_OBJECTS A;

表已建立。

SQL> CREATE INDEX IND_T_CREATED ON T (CREATED);

索引已建立。

SQL> ALTER TABLE T MODIFY CREATED NOT NULL;

表已更改。

SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS';

會話已更改。

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'T')

PL/SQL 過程已成功完成。

SQL> SET AUTOT ON EXP

SQL> SELECT COUNT(*) FROM (SELECT DISTINCT CREATED FROM T);

COUNT(*)

----------

4794

執行計劃

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=65 Card=1)

1 0 SORT (AGGREGATE)

2 1 VIEW (Cost=65 Card=4794)

3 2 SORT (UNIQUE) (Cost=65 Card=4794 Bytes=38352)

4 3 INDEX (FAST FULL SCAN) OF 'IND_T_CREATED' (NON-UNIQUE) (Cost=4 Card=41802 Bytes=334416)

SQL> SELECT COUNT(*) FROM (SELECT CREATED FROM T GROUP BY CREATED);

COUNT(*)

----------

4794

執行計劃

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=65 Card=1 Bytes=2)

1 0 SORT (AGGREGATE)

2 1 VIEW (Cost=65 Card=4794 Bytes=9588)

3 2 SORT (GROUP BY) (Cost=65 Card=4794 Bytes=38352)

4 3 INDEX (FAST FULL SCAN) OF 'IND_T_CREATED' (NON-UNIQUE) (Cost=4 Card=41802 Bytes=334416)

從執行計劃上看,DISTINCT的操作是SORT (UNIQUE),而GROUP BY是SORT (GROUP BY)。DISTINCT操作隻需要找出所有不同的值就可以了。而GROUP BY操作還要為其他聚集函數進行準備工作。從這一點上将,GROUP BY操作做的工作應該比DISTINCT所做的工作要多一些。

除了這一點,基本上看不到DISTINCT和GROUP BY(沒有聚集函數的情況)有什麼差別,而且從執行效率上也看不到明顯的差異。

不過從10g開始,二者的差異開始展現出來了。

SQL> CONN YANGTK/[email protected]已連接配接。

SQL> SET AUTOT OFF

SQL> SET TIMING OFF

SQL> CREATE TABLE T AS SELECT ROWNUM ID, A.* FROM DBA_OBJECTS A;

表已建立。

SQL> CREATE INDEX IND_T_CREATED ON T (CREATED);

索引已建立。

SQL> ALTER TABLE T MODIFY CREATED NOT NULL;

表已更改。

SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS';

會話已更改。

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'T')

PL/SQL 過程已成功完成。

SQL> SET AUTOT ON

SQL> SET TIMING ON

建立好測試環境後,看一看标準分頁函數中,兩個操作的差異:

SQL> SELECT *

2 FROM

3 (

4 SELECT ROWNUM RN, A.*

5 FROM

6 (

7 SELECT CREATED

8 FROM T

9 GROUP BY CREATED

10 ) A

11 WHERE ROWNUM < 20

12 )

13 WHERE RN >= 10;

RN CREATED

---------- -------------------

10 2005-12-19 17:07:57

11 2005-12-19 17:07:58

12 2005-12-19 17:08:24

13 2005-12-19 17:08:25

14 2005-12-19 17:08:26

15 2005-12-19 17:08:27

16 2005-12-19 17:08:28

17 2005-12-19 17:08:29

18 2005-12-19 17:08:33

19 2005-12-19 17:08:35

已選擇10行。

已用時間: 00: 00: 00.06

執行計劃

----------------------------------------------------------

Plan hash value: 3639065582

-------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|

-------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 19 | 418 | 1 (0)|

|* 1 | VIEW | | 19 | 418 | 1 (0)|

|* 2 | COUNT STOPKEY | | | | |

| 3 | VIEW | | 969 | 8721 | 1 (0)|

|* 4 | SORT GROUP BY STOPKEY| | 969 | 7752 | 1 (0)|

| 5 | INDEX FULL SCAN | IND_T_CREATED | 969 | 7752 | 1 (0)|

-------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("RN">=10)

2 - filter(ROWNUM<20)

4 - filter(ROWNUM<20)

統計資訊

----------------------------------------------------------

1 recursive calls

0 db block gets

67 consistent gets

0 physical reads

0 redo size

642 bytes sent via SQL*Net to client

385 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

10 rows processed

SQL> SELECT *

2 FROM

3 (

4 SELECT ROWNUM RN, A.*

5 FROM

6 (

7 SELECT DISTINCT CREATED

8 FROM T

9 ) A

10 WHERE ROWNUM < 20

11 )

12 WHERE RN >= 10;

RN CREATED

---------- -------------------

10 2005-12-19 17:07:57

11 2005-12-19 17:07:58

12 2005-12-19 17:08:24

13 2005-12-19 17:08:25

14 2005-12-19 17:08:26

15 2005-12-19 17:08:27

16 2005-12-19 17:08:28

17 2005-12-19 17:08:29

18 2005-12-19 17:08:33

19 2005-12-19 17:08:35

已選擇10行。

已用時間: 00: 00: 00.03

執行計劃

----------------------------------------------------------

Plan hash value: 1650124153

-------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|

-------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 19 | 418 | 14 (36)|

|* 1 | VIEW | | 19 | 418 | 14 (36)|

|* 2 | COUNT STOPKEY | | | | |

| 3 | VIEW | | 987 | 8883 | 14 (36)|

|* 4 | SORT GROUP BY STOPKEY| | 987 | 7896 | 14 (36)|

| 5 | INDEX FAST FULL SCAN| IND_T_CREATED | 50333 | 393K| 10 (10)|

-------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("RN">=10)

2 - filter(ROWNUM<20)

4 - filter(ROWNUM<20)

統計資訊

----------------------------------------------------------

1 recursive calls

0 db block gets

73 consistent gets

0 physical reads

0 redo size

642 bytes sent via SQL*Net to client

385 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

10 rows processed

出乎意料的是,GROUP BY操作的COST更低,而且邏輯讀也小,這似乎與二者的工作量成反比。仔細觀察執行計劃發現,問題的根源來自于GROUP BY使用INDEX FULL SCAN,而DISTINCT使用了INDEX FAST FULL SCAN。也許有人會感到奇怪,索引的快速全掃描不是要比索引全掃描效率更高嗎?對于讀取所有資料的情況下,确實是索引快速全掃效率更高。但是由于這裡采用了分頁,隻取前20條資料,而且Oracle的10g增加了GROUP BY STOPKEY這種新的執行路徑,是以在這裡GROUP BY操作的效率更高。

觀察執行計劃中的處理行數可以發現,索引全掃描由于是按照索引的順序掃描,是以利用了STOPKEY,僅僅處理了969條記錄就停了下來。而對于DISTINCT操作的快速索引全速而言,顯然沒有使用STOPKEY,讀取了所有的50333條記錄。這就是GROUP BY和DISTINCT的性能差異原因。

-----------------------------------------------------------------------

相對而言 我還是比較支援用GROUP BY