[20170120]ORACLE的Dead Connection Detection浅析.txt
1.环境:
SCOTT@book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
$ echo /proc/sys/net/ipv4/tcp_keepalive* | xargs -n 1 strings -1 -f
/proc/sys/net/ipv4/tcp_keepalive_intvl: 75
/proc/sys/net/ipv4/tcp_keepalive_probes: 9
/proc/sys/net/ipv4/tcp_keepalive_time: 7200
--我的测试当然不能这么长,我减少到300.
# sysctl -w net/ipv4/tcp_keepalive_time=300
net.ipv4.tcp_keepalive_time = 300
# echo /proc/sys/net/ipv4/tcp_keepalive* | xargs -n 1 strings -1 -f
/proc/sys/net/ipv4/tcp_keepalive_time: 300
/proc/sys/net/ipv4/tcp_keepalive_time 当keepalive起用的时候,TCP发送keepalive消息的频度。默认是2小时。
/proc/sys/net/ipv4/tcp_keepalive_intvl 当探测没有确认时,keepalive探测包的发送间隔。缺省是75秒。
/proc/sys/net/ipv4/tcp_keepalive_probes 如果对方不予应答,keepalive探测包的发送次数。缺省值是9。
2.建立相关表:
CREATE TABLE SESSION_WAIT_RECORD
AS
SELECT sid
,seconds_in_wait
,event
,SYSDATE AS curr_datetime
FROM v$session_wait
WHERE 1 = 0;
CREATE TABLE LOCK_OBJECT_RECORD
SELECT B.username
,B.sid
,B.serial#
,logon_time
,A.object_id
FROM v$locked_object A, v$session B
WHERE A.session_id = B.sid AND 1 = 0;
--建立测试表deptx,注意要通过网络连接,并且看看sqlnet.ora文件中不存在SQLNET.EXPIRE_TIME = NN.
--为了测试准确,我重启数据库以及监听.
SCOTT@book> create table deptx as select * from dept ;
--//session 1:(在windows下我的工作机器.)
SCOTT@book> @ &r/spid
SID SERIAL# SPID PID P_SERIAL# C50
---------- ---------- ------ ------- ---------- --------------------------------------------------
46 15 62463 28 8 alter system kill session '46,15' immediate;
SCOTT@book> update deptx set loc='aaaa' where deptno=10;
1 row updated.
--//session 2:
SCOTT@78> @ &r/spid
35 23 62453 27 8 alter system kill session '35,23' immediate;
SCOTT@78> update deptx set loc='aaaa' where deptno=10;
--//挂起.
4.打开另外session执行:
--//session 3:
$ cat ff.sql
DECLARE
v_index NUMBER := 1;
BEGIN
WHILE v_index != 0 LOOP
INSERT INTO SESSION_WAIT_RECORD
SELECT sid,
seconds_in_wait,
event ,
sysdate
FROM v$session_wait
WHERE sid = &&1;
INSERT INTO LOCK_OBJECT_RECORD
SELECT B.username,
B.sid,
B.serial#,
logon_time,
A.object_id ,
sysdate
FROM v$locked_object A,
v$session B
WHERE A.session_id = B.sid
AND A.session_id=&&1 AND B.serial#=&&2
ORDER BY B.logon_time;
commit;
dbms_lock.Sleep(1);
SELECT Count(*)
INTO v_index
END LOOP;
END;
/
SCOTT@book> @ ff 46 15
--//然后断开session 1,我采用很粗暴的方法,拔掉网线.注意session 2,session 3在tmux下执行的,即使断开网络,也一直在主机上执行.
SCOTT@book> select max(SECONDS_IN_WAIT) from SESSION_WAIT_RECORD;
MAX(SECONDS_IN_WAIT)
--------------------
978
--// 300+75*9=975 ,很相近.
--//如果修改
# sysctl -w net/ipv4/tcp_keepalive_time=7500
net.ipv4.tcp_keepalive_time = 7500
--//按照前面执行相同的测试方法,测试结果如下.
SCOTT@book> select max(SECONDS_IN_WAIT) from SESSION_WAIT_RECORD;
8177
--// 7500+75*9=8175 ,基本相近.
<a href="http://www.cnblogs.com/kerrycode/p/6292557.html">http://www.cnblogs.com/kerrycode/p/6292557.html</a>
那么在Oracle没有启用DCD时,系统和数据库如何判断一个连接是否异常,需要关闭呢?这个时间是这样计算的,首先它等待了
tcp_keepalive_time=7200,然后每隔75秒发送探测包,一共发送了9次后(7200+ 75*9 = 7875 ),都没有收到客户端应答,那么它就判断
这个连接死掉了,可以关闭了。所以这个值是一个固定值, 具体为7875, 当然不同的操作系统可能有所不同,取决于上面三个
tcp_keepalive参数,过了7875秒后, 这个时候PMON进程就会回收与它相关的所有资源(例如回滚事务,释放lock、latch、memory)。这
个值与我测试的时间非常接近了(考虑我们是采集的等待时间,以及测试脚本里面有休眠时间,这样采集的数据有些许偏差)。