天天看點

[20170603]12c Top Frequency histogram.txt

---恢複内容開始---

[20170603]12c Top Frequency histogram.txt

--//個人對直方圖了解很少,以前2種直方圖類型對于目前的許多應用來講已經足夠,或者講遇到的問題很少.

--//抽一點點時間,簡單探究12c Top Frequency histogram.

--//以前的頻率直方圖Frequency histogram,受限bucket(桶的大小),如果有255個不同值,oracle分析後不會建立頻率直方圖,而是建立高

--//度直方圖.這樣的情況會導緻一些流行值的統計在顯示執行計劃時差距很大.而12c引入了Top Frequency histogram,注意這裡的top,

--//我的了解就是流行值(popular),也就是這樣建立的直方圖僅僅包括popular,其他non-popular不考慮,這樣在sql語句的查詢這些

--//popular時,顯示的統計資訊相對準确,進而有利于oracle選擇正确的執行計劃.

--//以下是我的學習筆記,也許會存在許多錯誤,僅僅做一個記錄.我也看了許多别人的blog.^_^.而且我目前的環境隻有12.0.0.1(版本太

--//低).

1.環境:

SCOTT@test01p> @ ver1

PORT_STRING                    VERSION        BANNER                                                                               CON_ID

------------------------------ -------------- -------------------------------------------------------------------------------- ----------

IBMPC/WIN_NT64-9.1.0           12.1.0.1.0     Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0

--//如果要建立Top Frequency histogram必須要滿足幾個條件:

--//連結 raajeshwaran.blogspot.co.id/2016/06/top-frequency-histogram-in-12c.html

The database creates a Top frequency histogram, when the following criteria are met.

NDV is greater than n, where n is the requested number of buckets (default 254)

The percentage of rows occupied by Top-frequent values is greater than or equal to the threshold p where p is (1-(1/n)*100).

The estimate_percent parameter in dbms_stats gathering procedure should be auto_sample_size (set to default)

--//翻譯過來NDV(也就是字段的不同值)大于N(指bucket的數量).

--//流行值(popular)在Top-frequent中合計數量/總計數量之比要大于(1-(1/n)*100).如果建立10個桶,這樣流行值的總計必須在90%以上

2.首先驗證(1-(1/n)*100)比值是否正确:

SCOTT@test01p> create table t as select * from dba_objects;

Table created.

select column_name,num_distinct,density,histogram,SAMPLE_SIZE

  from user_tab_col_statistics

  where table_name ='T'

  and column_name ='OWNER';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       SAMPLE_SIZE

-------------------- ------------ ---------- --------------- -----------

OWNER                          32     .03125 NONE                  91695

--//12c ctas 建立統計資訊,但是不會建立直方圖.density 1/32=.03125.

SCOTT@test01p> select count(*) from t;

  COUNT(*)

----------

     91695

--//随手寫的sql語句:

with a as (select distinct owner,count(*) over(partition by owner) n1 ,count(*) over () n2 from t order by 2 desc ),

b as (select owner,n1,n2,sum(n1) over (order by n1 desc) n3  from a order by n1 desc)

select rownum,owner,n1,n2,n3,round(n3/n2,5) x1,round(1-1/rownum,5) x2 from b;

ROWNUM OWNER                N1         N2         N3         X1         X2

------ ----------------- ----- ---------- ---------- ---------- ----------

     1 SYS               41942      91695      41942     .45741          0

     2 PUBLIC            37142      91695      79084     .86247         .5

     3 APEX_040200        3405      91695      82489      .8996     .66667

     4 ORDSYS             3157      91695      85646     .93403        .75

     5 MDSYS              1819      91695      87465     .95387         .8

     6 XDB                 985      91695      88450     .96461     .83333

     7 SYSTEM              641      91695      89091      .9716     .85714

     8 CTXSYS              405      91695      89496     .97602       .875

     9 WMSYS               387      91695      89883     .98024     .88889

    10 DVSYS               352      91695      90235     .98408         .9

    11 SH                  309      91695      90544     .98745     .90909

    12 ORDDATA             292      91695      90836     .99063     .91667

    13 LBACSYS             209      91695      91045     .99291     .92308

    14 OE                  142      91695      91187     .99446     .92857

    15 SCOTT                96      91695      91283     .99551     .93333

    16 GSMADMIN_INTERNAL    77      91695      91360     .99635      .9375

    17 IX                   58      91695      91418     .99698     .94118

    18 DBSNMP               55      91695      91473     .99758     .94444

    19 PM                   44      91695      91517     .99806     .94737

    20 HR                   35      91695      91552     .99844        .95

    21 OLAPSYS              25      91695      91577     .99871     .95238

    22 OJVMSYS              23      91695      91600     .99896     .95455

    23 DVF                  19      91695      91619     .99917     .95652

    24 FLOWS_FILES          13      91695      91632     .99931     .95833

    25 AUDSYS               12      91695      91644     .99944        .96

    26 ORDPLUGINS           10      91695      91664     .99966     .96154

    27 OUTLN                10      91695      91664     .99966     .96296

    28 BI                    8      91695      91688     .99992     .96429

    29 ORACLE_OCM            8      91695      91688     .99992     .96552

    30 SI_INFORMTN_SCHEM     8      91695      91688     .99992     .96667

    31 APPQOSSYS             5      91695      91693     .99998     .96774

    32 TEST                  2      91695      91695          1     .96875

--//如果加入條件where round(n3/n2,5) >round(1-1/rownum,5),全部輸出.也就是這樣如果桶小于32,大于1.建立的都是Top Frequency.

3.繼續測試:

D:\temp>cat a1.sql

cat a1.sql

exec  dbms_stats.gather_table_stats(ownname=>user,tabname=>'T',method_opt=>'for columns owner size &1');

select column_name,num_distinct,density,histogram,SAMPLE_SIZE from user_tab_col_statistics where table_name ='T' and column_name ='OWNER';

SCOTT@test01p> @ a1.sql 2

PL/SQL procedure successfully completed.

OWNER                          32     .03125 HYBRID                 5500

SCOTT@test01p> @ a1.sql 3

OWNER                          32 5.4529E-06 TOP-FREQUENCY         91695

SCOTT@test01p> @ a1.sql 4

SCOTT@test01p> @ a1.sql 31

SCOTT@test01p> @ a1.sql 32

OWNER                          32 5.4529E-06 FREQUENCY             91695

--//除了bucket=2,32建立的直方圖HYBRID,FREQUENCY外,建立的都是TOP-FREQUENCY.

--//以10個bucket為例.解方程式(90235-x)/(91695-x)=0.9 ,得到x=77095.也就是要減少77095.

--//delete t where owner='SYS' and rownum<=41000;

--//delete t where owner='PUBLIC' and rownum<=36095;

SCOTT@test01p> delete t where owner='SYS' and rownum<=41000;

41000 rows deleted.

SCOTT@test01p> delete t where owner='PUBLIC' and rownum<=36095;

36095 rows deleted.

SCOTT@test01p> commit ;

Commit complete.

select rownum,owner,n1,n2,n3,round(n3/n2,5) x1,round(1-1/rownum,5) x2 from b where rownum<=11;

ROWNUM OWNER         N1         N2         N3         X1         X2

------ ----------- ---- ---------- ---------- ---------- ----------

     1 APEX_040200 3405      14600       3405     .23322          0

     2 ORDSYS      3157      14600       6562     .44945         .5

     3 MDSYS       1819      14600       8381     .57404     .66667

     4 PUBLIC      1047      14600       9428     .64575        .75

     5 XDB          985      14600      10413     .71322         .8

     6 SYS          942      14600      11355     .77774     .83333

     7 SYSTEM       641      14600      11996     .82164     .85714

     8 CTXSYS       405      14600      12401     .84938       .875

     9 WMSYS        387      14600      12788     .87589     .88889

    10 DVSYS        352      14600      13140         .9         .9

    11 SH           309      14600      13449     .92116     .90909

11 rows selected.

--//backet=10,前面10個值占90%.

SCOTT@test01p> @ a1 10

OWNER                          32 .000034247 TOP-FREQUENCY         14600

--//再減少1條記錄.

SCOTT@test01p> delete t where owner='SYS' and rownum<=1;

1 row deleted.

     1 APEX_040200 3405      14599       3405     .23324          0

     2 ORDSYS      3157      14599       6562     .44948         .5

     3 MDSYS       1819      14599       8381     .57408     .66667

     4 PUBLIC      1047      14599       9428      .6458        .75

     5 XDB          985      14599      10413     .71327         .8

     6 SYS          941      14599      11354     .77772     .83333

     7 SYSTEM       641      14599      11995     .82163     .85714

     8 CTXSYS       405      14599      12400     .84937       .875

     9 WMSYS        387      14599      12787     .87588     .88889

    10 DVSYS        352      14599      13139     .89999         .9

    11 SH           309      14599      13448     .92116     .90909

--//現在前10占.89999.

OWNER                          32    .018378 HYBRID                14599

--//可以發現建立的直方圖不是TOP-FREQUENCY,而是HYBRID(混合型直方圖).

4.轉化成TOP-FREQUENCY.

SCOTT@test01p> insert into t  select * from dba_objects where owner='SYS' and rownum=1;

1 row created.

--//DENSITY=1/SAMPLE_SIZE/2, 1/14600/2=.00003424657534246575正好符合.

5.現在看看執行計劃:

SCOTT@test01p> select count(*) from t where owner='DVSYS';

       352

SCOTT@test01p> @ dpc '' ''

PLAN_TABLE_OUTPUT

-------------------------------------

SQL_ID  2at34f0zaqhzj, child number 0

select count(*) from t where owner='DVSYS'

Plan hash value: 2966233522

---------------------------------------------------------------------------------------------------------------------

| Id  | Operation          | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |

|   0 | SELECT STATEMENT   |      |      1 |        |       |   428 (100)|          |      1 |00:00:00.01 |    1544 |

|   1 |  SORT AGGREGATE    |      |      1 |      1 |     8 |            |          |      1 |00:00:00.01 |    1544 |

|*  2 |   TABLE ACCESS FULL| T    |      1 |    352 |  2816 |   428   (0)| 00:00:01 |    352 |00:00:00.01 |    1544 |

Query Block Name / Object Alias (identified by operation id):

-------------------------------------------------------------

   1 - SEL$1

   2 - SEL$1 / T@SEL$1

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter("OWNER"='DVSYS')

--//可以發現e_rows=A-Rows.可以發現非常準确.

--//檢視直方圖資訊.

SCOTT@test01p> select endpoint_number,endpoint_actual_value from user_tab_histograms where table_name ='T' and column_name ='OWNER' order by 1;

ENDPOINT_NUMBER ENDPOINT_ACTUAL_VALU

--------------- --------------------

           3405 APEX_040200

           3810 CTXSYS

           4162 DVSYS

           5981 MDSYS

           9138 ORDSYS

          10185 PUBLIC

          11127 SYS

          11768 SYSTEM

          12155 WMSYS

          13140 XDB

10 rows selected.

--//4162-3810=352,可以發現正好符合.也就是popular值統計很正确.看看非popular值.

SCOTT@test01p> select count(*) from t where owner='DVSYS1';

Plan hash value: 2966233522        

|*  2 |   TABLE ACCESS FULL| T    |      1 |     66 |   528 |   428   (0)| 00:00:01 |      0 |00:00:00.01 |    1544 |

SCOTT@test01p> select count(*) from t where owner='SCOTT';

        96

|*  2 |   TABLE ACCESS FULL| T    |      1 |     66 |   528 |   428   (0)| 00:00:01 |     96 |00:00:00.01 |    1544 |

--//可以發現估計E-Rows如何計算的呢?

6.做10053跟蹤:

D:\tools\sqllaji>cat 10053x.sql

execute dbms_sqldiag.dump_trace(p_sql_id=>'&1',p_child_number=>&2,p_component=>'Compiler',p_file_id=>'&&1');

SCOTT@test01p> @ 10053x f31kz63ksu1tc 0

***************************************

SINGLE TABLE ACCESS PATH

  Single Table Cardinality Estimation for T[T]

SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

  Column (#1):

    NewDensity:0.004545, OldDensity:0.000034 BktCnt:13140.000000, PopBktCnt:13140.000000, PopValCnt:10, NDV:32

  Column (#1): OWNER(VARCHAR2)

    AvgLen: 8 NDV: 32 Nulls: 0 Density: 0.000000

    Histogram: Top-Freq  #Bkts: 13140  UncompBkts: 13140  EndPtVals: 10  ActualVal: yes

  Table: T  Alias: T

    Card: Original: 14600.000000  Rounded: 66  Computed: 66.36  Non Adjusted: 66.36

  Access Path: TableScan

    Cost:  428.36  Resp: 428.36  Degree: 0

      Cost_io: 428.00  Cost_cpu: 14122025

      Resp_io: 428.00  Resp_cpu: 14122025

  Best:: AccessPath: TableScan

         Cost: 428.36  Degree: 1  Resp: 428.36  Card: 66.36  Bytes: 0

    check parallelism for statement[<unnamed>]

kkfdtParallel: parallel is possible (no statement type restrictions)

    kkfdPaForcePrm: dop:1 ()

     use dictionary DOP(1) on table

kkfdPaPrm:- The table : 106380

kkfdPaPrm:DOP = 1 (computed from hint/dictionary/autodop)

kkfdiPaPrm: dop:1 serial(?)

--//使用 NewDensity:0.004545.

 BktCnt:13140.000000, PopBktCnt:13140.000000 => 對應就是前10個流行值的總和.

--//非流行值的數量: 14600-13140=1460

--//非流行值的桶數量: 32-10=22

--//非流行值的數量/非流行值的桶數量 1460/22=66.36363636363636363636,四舍五入66,正好符合執行計劃的推斷.

--//NewDensity的計算 =1460/14600/22=.00454545454545454545,非常接近.

---恢複内容結束---