原创 转载请注明出处
我的工作记录:
昨天早上生产库出现了大约一个小时资源耗尽的情况,分析得出语句如下:
MERGE INTO T_DPD_AGENT_CHANGE A
USING (SELECT A.AGENT_ID,
A.ENTER_COMPANY_DATE ENTER_DATE,
A.LEAVE_COMPANY_DATE LEAVE_DATE,
A.AGENT_STATUS,
D1.DEPT_ID TEAM,
D1.PRINCIPAL_ID TEAM_LEADER,
D2.DEPT_ID DEPT,
D2.PRINCIPAL_ID DEPT_LEADER,
D3.DEPT_ID REGION,
D3.PRINCIPAL_ID REGION_LEADER,
A.YEAR_MONTH
FROM T_AGENT_BACKUP A,
T_DEPT_BACKUP D1,
T_DEPT_BACKUP D2,
T_DEPT_BACKUP D3
WHERE A.AGENT_CATE = 1
AND A.YEAR_MONTH = D1.YEAR_MONTH
AND A.YEAR_MONTH = D2.YEAR_MONTH
AND A.YEAR_MONTH = D3.YEAR_MONTH
AND A.DEPT_ID = D1.DEPT_ID
AND D1.PARENT_ID = D2.DEPT_ID
AND D2.PARENT_ID = D3.DEPT_ID
UNION
SELECT A.AGENT_ID,
NULL
FROM T_AGENT A, T_DEPT D1, T_DEPT D2, T_DEPT D3
AND D2.PARENT_ID = D3.DEPT_ID) T
ON (A.AGENT_ID = T.AGENT_ID AND A.AGENT_STATUS = T.AGENT_STATUS AND A.TEAM = T.TEAM AND A.DEPT = T.DEPT AND A.REGION = T.REGION AND NVL(A.ENTER_DATE, DATE '1900-01-01') = NVL(T.ENTER_DATE, DATE '1900-01-01') AND NVL(A.LEAVE_DATE, DATE '1900-01-01') = NVL(T.LEAVE_DATE, DATE '1900-01-01') AND (A.YEAR_MONTH = T.YEAR_MONTH OR T.YEAR_MONTH IS NULL))
WHEN NOT MATCHED THEN
INSERT
(AGENT_ID,
ENTER_DATE,
LEAVE_DATE,
AGENT_STATUS,
TEAM,
TEAM_LEADER,
DEPT,
DEPT_LEADER,
REGION,
REGION_LEADER,
YEAR_MONTH)
VALUES
(T.AGENT_ID,
T.ENTER_DATE,
T.LEAVE_DATE,
T.AGENT_STATUS,
T.TEAM,
T.TEAM_LEADER,
T.DEPT,
T.DEPT_LEADER,
T.REGION,
T.REGION_LEADER,
T.YEAR_MONTH);
在预生产环境中这个语句的效率极低。
执行计划如下:
--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
| 1 | MERGE | T_DPD_AGENT_CHANGE | 1 | | 1 |00:06:05.16 | 180M| 4968 |
| 2 | VIEW | | 1 | | 208K|00:41:23.42 | 180M| 4954 |
| 3 | NESTED LOOPS OUTER | | 1 | 55530 | 208K|00:41:22.38 | 180M| 4954 |
| 4 | VIEW | | 1 | 55530 | 128K|00:00:02.78 | 5403 | 3575 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5 | SORT UNIQUE | | 1 | 55530 | 128K|00:00:02.27 | 5403 | 3575 |
| 6 | UNION-ALL | | 1 | | 128K|00:00:03.50 | 5403 | 3575 |
|* 7 | HASH JOIN | | 1 | 48752 | 121K|00:00:02.72 | 4677 | 3027 |
| 8 | TABLE ACCESS FULL | T_DEPT_BACKUP | 1 | 45639 | 48600 |00:00:00.10 | 825 | 820 |
|* 9 | HASH JOIN | | 1 | 74512 | 121K|00:00:01.83 | 3852 | 2207 |
| 10 | TABLE ACCESS FULL | T_DEPT_BACKUP | 1 | 45639 | 48600 |00:00:00.10 | 819 | 0 |
|* 11 | HASH JOIN | | 1 | 113K| 121K|00:00:01.04 | 3033 | 2207 |
| 12 | TABLE ACCESS FULL| T_DEPT_BACKUP | 1 | 45639 | 48600 |00:00:00.05 | 819 | 0 |
|* 13 | TABLE ACCESS FULL| T_AGENT_BACKUP | 1 | 114K| 121K|00:00:00.26 | 2214 | 2207 |
|* 14 | HASH JOIN | | 1 | 6778 | 6976 |00:00:00.13 | 726 | 548 |
|* 15 | HASH JOIN | | 1 | 2577 | 2336 |00:00:00.04 | 165 | 48 |
| 16 | TABLE ACCESS FULL | T_DEPT | 1 | 2652 | 2652 |00:00:00.03 | 55 | 48 |
|* 17 | HASH JOIN | | 1 | 2614 | 2614 |00:00:00.01 | 110 | 0 |
| 18 | TABLE ACCESS FULL| T_DEPT | 1 | 2652 | 2652 |00:00:00.01 | 55 | 0 |
| 19 | TABLE ACCESS FULL| T_DEPT | 1 | 2652 | 2652 |00:00:00.01 | 55 | 0 |
|* 20 | TABLE ACCESS FULL | T_AGENT | 1 | 6976 | 6976 |00:00:00.05 | 561 | 500 |
| 21 | VIEW | | 128K| 1 | 201K|00:39:49.69 | 180M| 1379 |
|* 22 | TABLE ACCESS FULL | T_DPD_AGENT_CHANGE | 128K| 1 | 201K|00:39:47.65 | 180M| 1379 |
红色部分就是瓶颈所在,NEST LOOP JOIN 做了大量的对内部表的FULL SCAN 在内存中逻辑读,因为NEST LOOP JION 每对驱动表中有一行合适的数据就会在内部表中进行一次扫描,这里一共对T_DPD_AGENT_CHANGE表进行了128000此全表扫描,这个不是代码引起的循环,完全是因为NEST LOOP JION实现方式的特点导致。时间花费为39分钟,而整个语句大约在45分钟左右。
所以我们必须改变全表扫描的访问路径,或者改变JION 方式。
这里我使用的是建立联合索引来改变其访问方式
SQL> create index test
2 on T_DPD_AGENT_CHANGE(AGENT_STATUS ,TEAM,REGION ,LEAVE_DATE,YEAR_MONTH );
然后分析统计数据后重新执行这个语句,执行计划如下:
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
| 1 | MERGE | T_DPD_AGENT_CHANGE | 1 | | 1 |00:00:24.03 | 5624K|
| 2 | VIEW | | 1 | | 215K|00:00:25.91 | 5624K|
| 3 | NESTED LOOPS OUTER | | 1 | 55530 | 215K|00:00:25.26 | 5624K|
| 4 | VIEW | | 1 | 55530 | 128K|00:00:01.61 | 5391 |
| 5 | SORT UNIQUE | | 1 | 55530 | 128K|00:00:01.22 | 5391 |
| 6 | UNION-ALL | | 1 | | 128K|00:00:03.22 | 5391 |
|* 7 | HASH JOIN | | 1 | 48752 | 121K|00:00:02.44 | 4665 |
| 8 | TABLE ACCESS FULL | T_DEPT_BACKUP | 1 | 45639 | 48600 |00:00:00.10 | 820 |
|* 9 | HASH JOIN | | 1 | 74512 | 121K|00:00:01.78 | 3845 |
| 10 | TABLE ACCESS FULL | T_DEPT_BACKUP | 1 | 45639 | 48600 |00:00:00.05 | 819 |
|* 11 | HASH JOIN | | 1 | 113K| 121K|00:00:01.01 | 3026 |
| 12 | TABLE ACCESS FULL | T_DEPT_BACKUP | 1 | 45639 | 48600 |00:00:00.05 | 819 |
|* 13 | TABLE ACCESS FULL | T_AGENT_BACKUP | 1 | 114K| 121K|00:00:00.24 | 2207 |
|* 14 | HASH JOIN | | 1 | 6778 | 6976 |00:00:00.07 | 726 |
|* 15 | HASH JOIN | | 1 | 2577 | 2336 |00:00:00.01 | 165 |
PLAN_TABLE_OUTPUT
| 16 | TABLE ACCESS FULL | T_DEPT | 1 | 2652 | 2652 |00:00:00.01 | 55 |
|* 17 | HASH JOIN | | 1 | 2614 | 2614 |00:00:00.01 | 110 |
| 18 | TABLE ACCESS FULL | T_DEPT | 1 | 2652 | 2652 |00:00:00.01 | 55 |
| 19 | TABLE ACCESS FULL | T_DEPT | 1 | 2652 | 2652 |00:00:00.01 | 55 |
|* 20 | TABLE ACCESS FULL | T_AGENT | 1 | 6976 | 6976 |00:00:00.02 | 561 |
| 21 | VIEW | | 128K| 1 | 215K|00:00:23.09 | 5619K|
|* 22 | TABLE ACCESS BY INDEX ROWID| T_DPD_AGENT_CHANGE | 128K| 1 | 215K|00:00:21.72 | 5619K|
|* 23 | INDEX RANGE SCAN | TEST | 128K| 1 | 6879K|00:00:17.91 | 420K|
红色部分显示了优化后这个访问执行的时间,以前的39分钟变为了不到1分钟,所以加联合索引的效果非常明显。当然我是在预生产环境做的测试
,其数据量和生产差不多,虽然机器性能有出入,但是不会影响优化器对执行计划的判断。
当然也可以通过SQL TRACE 10046 EVENT来查看大量的逻辑读。但要确定瓶颈在哪里出现还是以上的方法为好,我也查看SQL TRACE 发现大量的逻辑读发生在SQL 语句执行阶段,这个也很好理解,在执行过程中确实出现了大量的逻辑读,取回阶段其实只有很少行
所以这个问题解决的方法就是建立联合索引,谢谢!