Java Forum / General / December 2007
After a while all outbound connections get stuck in SYN_SENT
jamesnichols3 - 10 Dec 2007 21:21 GMT I have a Java application that makes a large number of outbound webservice calls over HTTP/TCP. The hosts contacted are a fixed set of about 2000 hosts and a web service call is made to each of them approximately every 5 mintues by a pool of 200 Java threads. Over time, on average a percentage of these hosts are unreachable for one reason or another, so there is a persistent count of sockets in the SYN_SENT state in the range of about 60-80. This is fine, as these failed connection attempts eventually time out.
However, after approximately 38 hours of operation, all outbound connection attemtps get stuck in the SYN_SENT state. It happens instantaneously, where we go from the baseline of about 60-80 sockets in SYN_SENT to a count of 200 (corresponding to the # of java threads that make these calls). I've tried several things to clear this problem up, including:
1) Restarting the Java application 2) ip route flush cache 3) Start/stop networking 4) rmmod/insmod the kernel driver for the NIC 5) Tuning of /proc/sys/net/ipv4/tcp_syn_retries 6) Disabling /proc/sys/net/ipv4/tcp_syncookies
However, after each of these countermeasures, the outbound connections still get stuck in SYN_SENT. During this time, I am still able to SSH to the box and run wget www.google.com, etc, so the problem appears to be specific to the hosts that I'm accessing via the webservices. The only thing that makes this problem go away is to restart the entire Linux box. Once I do this and restart my application it works perfectly fine... for 38 hours until it occurs again.
I'm running kernel 2.6.18 on RedHat, but have had this problem occur on other kernel versions. I've also had this problem occur on different boxes, NICs, routers, co-location facilities, and several other variables. The only thing in common is my application and the fact that it is Linux, so I have to believe that my application is causing something wierd in the kernel, since an application restart doesn't help.
Any ideas?
Owen Jacobson - 10 Dec 2007 21:50 GMT > I have a Java application that makes a large number of outbound > webservice calls over HTTP/TCP. The hosts contacted are a fixed set [quoted text clipped - 36 lines] > > Any ideas? SYN_SENT means the local host has transmitted a SYN requesting the creation of a connection but has not yet received either an RST response indicating that nothing's listening nor a ACK SYN response indicating that something *is* listening. Probable culprits would be, in roughly descending order,
- firewall problems, - the remote host has gone down or is not responding to network traffic, - firewall problems, - misconfiguration somewhere in between your machine and the remote host, and - firewall problems.
Dig up a copy of Wireshark and watch the actual network traffic between your machine and the host you're calling services on to see which of these is likely. If possible run it from both inside and outside your own firewall so you can see if your firewall is blocking the returning ACK+SYN or even the outgoing SYN or not.
jamesnichols3 - 10 Dec 2007 22:04 GMT >SYN_SENT means the local host has transmitted a SYN requesting the >creation of a connection but has not yet received either an RST [quoted text clipped - 15 lines] >outside your own firewall so you can see if your firewall is blocking >the returning ACK+SYN or even the outgoing SYN or not. Hi,
I've had this problem over multiple types of firewall devices, versions, and configurations. It's not possible for me to packet capture outside of the firewall. Unfortunately, the data rate is such that it's nearly impossible to gain many insights from the internal packet capture that I can take. This problem is occuring when connecting to 1000's of hosts spread out all over the internet, so it's highly unlikely that they are all going down at once or there is some misconfiguration that occurs- every 38 hours. It is indicative of something systematic happening in the OS, but I can't figure out what it is.
Owen Jacobson - 10 Dec 2007 22:37 GMT > >SYN_SENT means the local host has transmitted a SYN requesting the > >creation of a connection but has not yet received either an RST [quoted text clipped - 27 lines] > of something systematic happening in the OS, but I can't figure out what it > is. Maybe the NIC sucks.
jamesnichols3 - 11 Dec 2007 01:08 GMT It's happend with a couple of different NICs too :(
>> >SYN_SENT means the local host has transmitted a SYN requesting the >> >creation of a connection but has not yet received either an RST [quoted text clipped - 3 lines] > >Maybe the NIC sucks. Jim Garrison - 11 Dec 2007 01:21 GMT > It's happend with a couple of different NICs too :( You say you can't capture packets outside the firewall... how about between the firewall and your failing system? Can you insert a cheap Ethernet hub (NOT a switch) and attach a second Linux system running Wireshark to the hub? The first step in debugging is deciding where the response SYN-ACK packets are being lost: either outside the firewall or between the firewall and your box. If you see the responses on the monitor system the problem is in your box. If you don't, the problem is upstream.
Note that the setup I described is not equivalent to capturing packets on your failing box. It might be dropping the packets.
Nigel Wade - 11 Dec 2007 10:45 GMT >>SYN_SENT means the local host has transmitted a SYN requesting the >>creation of a connection but has not yet received either an RST [quoted text clipped - 27 lines] > of something systematic happening in the OS, but I can't figure out what it > is. Are you running iptables on the system in question? What happens if you disable it?
It's just possible that the state table is filling up so ESTABLISHED,RELATED packets are no longer being accepted. This would result in the SYN,ACK response from the remote end being dropped, and a socket hung in the SYN_SENT state.
You can look at the iptables state table using some esoteric magic incantation, which I can't remember offhand. I should have it in my firewall notes, I'll try to locate it (it's not something I have to do very often...)
 Signature Nigel Wade, System Administrator, Space Plasma Physics Group, University of Leicester, Leicester, LE1 7RH, UK E-mail : nmw@ion.le.ac.uk Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
jamesnichols3 - 11 Dec 2007 18:03 GMT >Are you running iptables on the system in question? What happens if you disable >it? [quoted text clipped - 6 lines] >which I can't remember offhand. I should have it in my firewall notes, I'll try >to locate it (it's not something I have to do very often...) Yes, I am running iptables. My ip_conntrack_max is set to 65K, so I don't think I'm filling that up. I can't really disable it during actual application usage... what I have done is:
1) stop the application 2) run /etc/init.d/iptables stop 3) run /etc/init.d/iptables start 4) Restart the application
And all the outbound connections get stuck in SYN_SENT
Nigel Wade - 12 Dec 2007 11:33 GMT >>Are you running iptables on the system in question? What happens if you disable >>it? [quoted text clipped - 9 lines] > Yes, I am running iptables. My ip_conntrack_max is set to 65K, so I don't > think I'm filling that up. Unlikely yes, but why guess? You might be the target of a SYN flood DoS attack, or have an errant network application or appliance. Take a look at /proc/net/ip_conntrack
> I can't really disable it during actual > application usage... It would only need to be for a few seconds whilst you ran a packet capture. You want to be sure that all packets received on the network interface are visible to wireshark. Is the machine in question plugged into a managed switch? If so you might be able to set one port to monitoring mode and see all traffic on the switch allowing you to see traffic to that machine externally.
> what I have done is: > [quoted text clipped - 4 lines] > > And all the outbound connections get stuck in SYN_SENT When in this state what does your iptables state table look like? For each connection in the SYN_SENT state you should have an equivalent entry in the ip_conntrack state table. When you start a new connection does it go into the state table, and in what state? Does this affect other network applications? If you run wireshark and capture packets whilst a new connection is being attempted, what does that show?
It really sounds like you have a problem with iptables, or an external networking appliance. Something is dropping the outbound SYN, or the SYN/ACK replies. I doubt very much that it's due to Java.
 Signature Nigel Wade, System Administrator, Space Plasma Physics Group, University of Leicester, Leicester, LE1 7RH, UK E-mail : nmw@ion.le.ac.uk Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
jamesnichols3 - 12 Dec 2007 16:25 GMT >Unlikely yes, but why guess? You might be the target of a SYN flood DoS attack, >or have an errant network application or appliance. Take a look >at /proc/net/ip_conntrack Yes, there are only a few thousand entries in ip_conntrack.
>It would only need to be for a few seconds whilst you ran a packet capture. You >want to be sure that all packets received on the network interface are visible [quoted text clipped - 14 lines] >you run wireshark and capture packets whilst a new connection is being >attempted, what does that show? The connection do end up in ip_conntrack, in SYN_SENT state. This only effects the outbound webservices traffic. I can ssh into/out of the box and wget www.google.com, but can't contact the webserivce hosts, even using wget/telnet/etc. It's something at the OS level, so I'm pretty sure that Java's usage of networking is doing something at the OS level over time.
>It really sounds like you have a problem with iptables, or an external >networking appliance. Something is dropping the outbound SYN, or the SYN/ACK >replies. I doubt very much that it's due to Java. I agree, I think that it is the workload caused by Java that is triggering something in the OS. It really can't be a router or firewall, as I have completely rebuilt this part of the infrastructure several times over the past several years and the problem is still there. The only thing that makes the problem go away is rebooting the box.
Nigel Wade - 13 Dec 2007 10:50 GMT >>Unlikely yes, but why guess? You might be the target of a SYN flood DoS attack, >>or have an errant network application or appliance. Take a look [quoted text clipped - 26 lines] > wget/telnet/etc. It's something at the OS level, so I'm pretty sure that > Java's usage of networking is doing something at the OS level over time. It's unlikely to be at the OS level. The OS won't differentiate between Java opening a socket and ssh opening a socket. Also, it almost certainly not at the application level, the SYN has been sent so the request to open the socket has got to the transport layer.
What your diagnostics show is that a SYN has been sent by the transport layer of the network stack (this has been detected by iptables, within the kernel). Where this has gone to you haven't yet established. Without external diagnostics you are pretty much flying blind. You need to talk to your network support people and ask them to help you find out what is going on. Either the SYN is not being delivered to the remote server, or the response is not getting back to your system. Either way it's a networking problem either at a very low level in your system, or a routing/firewalling problem between your system and the remote machine. You need to establish where the SYN, or SYN/ACK response, are disappearing.
>>It really sounds like you have a problem with iptables, or an external >>networking appliance. Something is dropping the outbound SYN, or the SYN/ACK [quoted text clipped - 5 lines] > past several years and the problem is still there. The only thing that makes > the problem go away is rebooting the box. There's a possibility that you are falling foul of some resource limit in the networking. If previous sockets haven't been fully closed then the remote server (or its firewall etc.) may not be allowing you to establish a new one. Re-booting will probably result in a disconnect at the remote end. Have a look at your netstat and see what network connections there are existing between your machine and those webservice providers.
 Signature Nigel Wade, System Administrator, Space Plasma Physics Group, University of Leicester, Leicester, LE1 7RH, UK E-mail : nmw@ion.le.ac.uk Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
John W. Kennedy - 14 Dec 2007 03:05 GMT > It's unlikely to be at the OS level. The OS won't differentiate between Java > opening a socket and ssh opening a socket. Windows with the usual sort of anti-virus software will (or will seem to).
 Signature John W. Kennedy "But now is a new thing which is very old-- that the rich make themselves richer and not poorer, which is the true Gospel, for the poor's sake." -- Charles Williams. "Judgement at Chelmsford"
Nigel Wade - 18 Dec 2007 10:12 GMT >> It's unlikely to be at the OS level. The OS won't differentiate between Java >> opening a socket and ssh opening a socket. > > Windows with the usual sort of anti-virus software will (or will seem to). But the OP is using iptables (so I presume Linux), which is a simple packet level filter. It knows nothing about what application generated the packet.
 Signature Nigel Wade, System Administrator, Space Plasma Physics Group, University of Leicester, Leicester, LE1 7RH, UK E-mail : nmw@ion.le.ac.uk Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
Martin Gregorie - 11 Dec 2007 12:22 GMT > However, after approximately 38 hours of operation, all outbound > connection attemtps get stuck in the SYN_SENT state. It happens [quoted text clipped - 4 lines] > > 1) Restarting the Java application Are you saying that all sockets are immediately stuck, i.e., no successful connections at all, after you restart the application?
If so, the problem, as others have said, has to be outside your application.
OTOH, if it runs for a while and then hangs up again have you tried periodically closing and re-opening each socket in case something rots it after a large number of connect/disconnect cycles?
 Signature martin@ | Martin Gregorie gregorie. | Essex, UK org |
jamesnichols3 - 11 Dec 2007 18:05 GMT >Are you saying that all sockets are immediately stuck, i.e., no >successful connections at all, after you restart the application? [quoted text clipped - 5 lines] >periodically closing and re-opening each socket in case something rots >it after a large number of connect/disconnect cycles? Yes, when I restart the application all of the outbound connection immediately get stuck in SYN_SENT. One or two might make it out, but 99% get stuck in SYN_SENT until all of the threads responsible for outbound connections are stuck waiting on sockets in this state. The sockets are open and closed by each of the 200 threads at least every 5 minutes or so.
Martin Gregorie - 12 Dec 2007 21:34 GMT >> Are you saying that all sockets are immediately stuck, i.e., no >> successful connections at all, after you restart the application? [quoted text clipped - 11 lines] > connections are stuck waiting on sockets in this state. The sockets are open > and closed by each of the 200 threads at least every 5 minutes or so. In that case I agree with everybody else: the problem is most probably external to the Java app. However I do have an additional suggestion:
It would be useful to know WHERE the stoppage happens. 'traceroute' may help here. Running it with the -p option lets you trace the route to a specific port at the destination and the -T uses SYN to do the probing.
Try running "traceroute -T -p=port host" against one of your usual targets when nothing is stuck and before you start your application. After that host becomes stuck stop your application and try traceroute again with the same command line arguments and see how far the second traceroute gets before it blocks.
 Signature martin@ | Martin Gregorie gregorie. | Essex, UK org |
Martin Gregorie - 13 Dec 2007 11:37 GMT > Correction: that should be "traceroute -T -p port host" >
 Signature martin@ | Martin Gregorie gregorie. | Essex, UK org |
jamesnichols3 - 14 Dec 2007 12:30 GMT I figured out a countermeasure. When the 38 hour limit is hit and the connections start to get stuck in SYN_SENT, I disabled tcp_sack in the linux kernel. Almost instantly, the SYN_SENT connections cleared up and connectivity was restored. I beleive there is a bug in the tcp_sack implementation and based on my application workload, a memory structure or something is being filled up after 38 hours and causing this behavior.
>I have a Java application that makes a large number of outbound >webservice calls over HTTP/TCP. The hosts contacted are a fixed set [quoted text clipped - 36 lines] > >Any ideas?
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|