天天看点

WCF服务在高并发情况下报目标积极拒绝的异常处理 z

wcf的监控服务,偶尔监控到目标服务会报一个目标积极拒绝的错误。一开始以为服务停止了,上服务器检查目标服务好好的活着。于是开始查原因。

一般来说目标积极拒绝(TCP 10061)的异常主要是2种可能:

1:服务器关机或者服务关闭

2:Client调用的端口错误或者服务器防火墙没开相应的端口

但是我们的服务本身是可以调用的,只是偶尔报这个错误,说明并不是这2个问题造成的。继续google,​

<col>

1

2

3

4

5

6

7

8

9

10

11

12

13

​<code>​If ​</code>​​<code>​this​</code>​ ​<code>​happens always, it literally means that the machine exists but that it has no services listening ​</code>​​<code>​on​</code>​ ​<code>​the specified port, or there ​</code>​​<code>​is​</code>​ ​<code>​a firewall stopping you.​</code>​

​<code>​If it happens occasionally - you used the word ​</code>​​<code>​"sometimes"​</code>​ ​<code>​- and retrying succeeds, it ​</code>​​<code>​is​</code>​ ​<code>​likely because the server has a full ​</code>​​<code>​'&lt;strong&gt;backlog&lt;/strong&gt;'​</code>​​<code>​.​</code>​

​<code>​When you are waiting to be accepted ​</code>​​<code>​on​</code>​ ​<code>​a listening socket, you are placed ​</code>​​<code>​in​</code>​ ​<code>​a backlog. This backlog ​</code>​​<code>​is​</code>​ ​<code>​finite and quite ​</code>​​<code>​short​</code>​ ​<code>​- values of 1, 2 or 3 are not unusual - and so the OS might be unable to queue your request ​</code>​​<code>​for​</code>​ ​<code>​the ​</code>​​<code>​'accept'​</code>​ ​<code>​to consume.​</code>​

​<code>​The backlog ​</code>​​<code>​is​</code>​ ​<code>​a parameter ​</code>​​<code>​on​</code>​ ​<code>​the listen function - all languages and platforms have basically the same API ​</code>​​<code>​in​</code>​ ​<code>​this​</code>​ ​<code>​regard, even the C# one. This parameter ​</code>​​<code>​is​</code>​ ​<code>​often configurable ​</code>​​<code>​if​</code>​ ​<code>​you control the server, and ​</code>​​<code>​is​</code>​ ​<code>​likely read ​</code>​​<code>​from​</code>​ ​<code>​some settings file or the registry. Investigate how to configure your server.​</code>​

​<code>​If you wrote the server, you might have heavy processing ​</code>​​<code>​in​</code>​ ​<code>​the accept of your socket, and ​</code>​​<code>​this​</code>​ ​<code>​can be better moved to a separate worker-thread so your accept ​</code>​​<code>​is​</code>​ ​<code>​always ready to receive connections. There are various architecture choices you can explore that mitigate queuing up clients and processing them sequentially.​</code>​

​<code>​Regardless of whether you can increase the server backlog, you ​</code>​​<code>​do​</code>​ ​<code>​need retry logic ​</code>​​<code>​in​</code>​ ​<code>​your client code to cope with ​</code>​​<code>​this​</code>​ ​<code>​issue - ​</code>​​<code>​as​</code>​ ​<code>​even with a ​</code>​​<code>​long​</code>​ ​<code>​backlog the server might be receiving lots of other requests ​</code>​​<code>​on​</code>​ ​<code>​that port at that time.​</code>​

​<code>​There ​</code>​​<code>​is​</code>​ ​<code>​a rare possibility ​</code>​​<code>​where​</code>​ ​<code>​a NAT router would give ​</code>​​<code>​this​</code>​ ​<code>​error should it's ports ​</code>​​<code>​for​</code>​ ​<code>​mappings be exhausted. I think we can discard ​</code>​​<code>​this​</code>​ ​<code>​possibility ​</code>​​<code>​as​</code>​ ​<code>​too much of a ​</code>​​<code>​long​</code>​ ​<code>​shot though, since the router has 64K simultaneous connections to the same destination address/port before exhaustion.​</code>​

大概意思就是如果这个错误是一直发生的那么可能是服务器或者防火墙的问题,如果这个问题是“Sometime”发生的,那么可能是backlog的 问题。backlog是tcp层面的请求队列,当你调用socket发起请求的时候服务端会排成一个队列,在高并发情况下服务端来不及处理请求,那么有些 请求就被直接被丢弃,于是就报了目标积极拒绝TCP10061的异常。

有了backlog于是继续google关键字“WCF backlog”发现wcf binding配置确实有一个listenBacklog的项目,默认值是10,于是把服务的listenBacklog改成100,问题搞定。

对了添加listenBacklog属性的时候有个注意的是一定要移除一个默认的endpoint      &lt;endpoint address="mex" binding="mexTcpBinding" bindingConfiguration="" contract="IMetadataExchange" /&gt;这个endpoint是用来给vs等发现元数据用的,如果这个不移走启动服务的时候会报端口已经被监听的错误。