[20170120]ORACLE的Dead Connection Detection淺析.txt
1.環境:
SCOTT@book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
$ echo /proc/sys/net/ipv4/tcp_keepalive* | xargs -n 1 strings -1 -f
/proc/sys/net/ipv4/tcp_keepalive_intvl: 75
/proc/sys/net/ipv4/tcp_keepalive_probes: 9
/proc/sys/net/ipv4/tcp_keepalive_time: 7200
--我的測試當然不能這麼長,我減少到300.
# sysctl -w net/ipv4/tcp_keepalive_time=300
net.ipv4.tcp_keepalive_time = 300
# echo /proc/sys/net/ipv4/tcp_keepalive* | xargs -n 1 strings -1 -f
/proc/sys/net/ipv4/tcp_keepalive_time: 300
/proc/sys/net/ipv4/tcp_keepalive_time 當keepalive起用的時候,TCP發送keepalive消息的頻度。預設是2小時。
/proc/sys/net/ipv4/tcp_keepalive_intvl 當探測沒有确認時,keepalive探測包的發送間隔。預設是75秒。
/proc/sys/net/ipv4/tcp_keepalive_probes 如果對方不予應答,keepalive探測包的發送次數。預設值是9。
2.建立相關表:
CREATE TABLE SESSION_WAIT_RECORD
AS
SELECT sid
,seconds_in_wait
,event
,SYSDATE AS curr_datetime
FROM v$session_wait
WHERE 1 = 0;
CREATE TABLE LOCK_OBJECT_RECORD
SELECT B.username
,B.sid
,B.serial#
,logon_time
,A.object_id
FROM v$locked_object A, v$session B
WHERE A.session_id = B.sid AND 1 = 0;
--建立測試表deptx,注意要通過網絡連接配接,并且看看sqlnet.ora檔案中不存在SQLNET.EXPIRE_TIME = NN.
--為了測試準确,我重新開機資料庫以及監聽.
SCOTT@book> create table deptx as select * from dept ;
--//session 1:(在windows下我的工作機器.)
SCOTT@book> @ &r/spid
SID SERIAL# SPID PID P_SERIAL# C50
---------- ---------- ------ ------- ---------- --------------------------------------------------
46 15 62463 28 8 alter system kill session '46,15' immediate;
SCOTT@book> update deptx set loc='aaaa' where deptno=10;
1 row updated.
--//session 2:
SCOTT@78> @ &r/spid
35 23 62453 27 8 alter system kill session '35,23' immediate;
SCOTT@78> update deptx set loc='aaaa' where deptno=10;
--//挂起.
4.打開另外session執行:
--//session 3:
$ cat ff.sql
DECLARE
v_index NUMBER := 1;
BEGIN
WHILE v_index != 0 LOOP
INSERT INTO SESSION_WAIT_RECORD
SELECT sid,
seconds_in_wait,
event ,
sysdate
FROM v$session_wait
WHERE sid = &&1;
INSERT INTO LOCK_OBJECT_RECORD
SELECT B.username,
B.sid,
B.serial#,
logon_time,
A.object_id ,
sysdate
FROM v$locked_object A,
v$session B
WHERE A.session_id = B.sid
AND A.session_id=&&1 AND B.serial#=&&2
ORDER BY B.logon_time;
commit;
dbms_lock.Sleep(1);
SELECT Count(*)
INTO v_index
END LOOP;
END;
/
SCOTT@book> @ ff 46 15
--//然後斷開session 1,我采用很粗暴的方法,拔掉網線.注意session 2,session 3在tmux下執行的,即使斷開網絡,也一直在主機上執行.
SCOTT@book> select max(SECONDS_IN_WAIT) from SESSION_WAIT_RECORD;
MAX(SECONDS_IN_WAIT)
--------------------
978
--// 300+75*9=975 ,很相近.
--//如果修改
# sysctl -w net/ipv4/tcp_keepalive_time=7500
net.ipv4.tcp_keepalive_time = 7500
--//按照前面執行相同的測試方法,測試結果如下.
SCOTT@book> select max(SECONDS_IN_WAIT) from SESSION_WAIT_RECORD;
8177
--// 7500+75*9=8175 ,基本相近.
<a href="http://www.cnblogs.com/kerrycode/p/6292557.html">http://www.cnblogs.com/kerrycode/p/6292557.html</a>
那麼在Oracle沒有啟用DCD時,系統和資料庫如何判斷一個連接配接是否異常,需要關閉呢?這個時間是這樣計算的,首先它等待了
tcp_keepalive_time=7200,然後每隔75秒發送探測包,一共發送了9次後(7200+ 75*9 = 7875 ),都沒有收到用戶端應答,那麼它就判斷
這個連接配接死掉了,可以關閉了。是以這個值是一個固定值, 具體為7875, 當然不同的作業系統可能有所不同,取決于上面三個
tcp_keepalive參數,過了7875秒後, 這個時候PMON程序就會回收與它相關的所有資源(例如復原事務,釋放lock、latch、memory)。這
個值與我測試的時間非常接近了(考慮我們是采集的等待時間,以及測試腳本裡面有休眠時間,這樣采集的資料有些許偏差)。