Man, I hope you're in Europe somewhere. The thought of getting up before
5:30 AM (central time USA) and actually having complex thought processes
working is a scary one for me. Unless you're still on the night before.
Hooking it up to a simpleton Linksys firewall router worked. I'm guessing
the incomplete DNS setup was causing some sort of security problem with
the SSL at some level. What really bothers me about this is that it was
locking up rather then failing clean. Unfortunately since it's working I
doubt I'll have the time to followup on finding where the bug is. I
included some more info below for thread completeness for anyone
interested.
Thanks to everyone for the ideas.
> What happens if you remove the app from the startup sequence entirely, and only
> run it once you are /sure/ that DHCP, /dev/random, etc, are all fully set up ?
It's being start in rc.local so it should be the last thing started but
even if that wasn't the case I had tried shutting down and restarting the
app.
> I should have asked before. When you say the application freezes, what do you
> mean ?
>
> More specifically: Is it hanging in a send, or in a receive, or (even)
> somewhere else ?
> What can you see when you run it under a debugger ?
Unfortunately my development environment is all Windows here. The
deployment is on fully automated headless small form factor Lniux PCs. The
This problem doesn't manifest itself in the development environment. I
have no debugging tools on the deployment computers. From the logging I
can tell it is getting past the Socket.connect() and the
ServerSocket.accept() but not past the reading or writing anything from or
to the connection. In other words it appears threads on both sides go to
read and write and never return. After the connection the app writer
thread is supposed to send stuff to the server which the server then
acknowledges. The stuff is never being sent. The confusing part
is there are several log messages that should be spit out after the
connect but prior to any actual writes on the socket. These are not
happening. The reader thread is doing a read on the socket immediately
after the connection is established. So the best guess is the read call on
the socket is locking not only that thread but the entire process.
> When you
> sniff the network, where did the last packet get sent, was it from the app or
> to it, was it to/from the server, or somewhere else ?
The last packet was from the app to the server. That was one of the things
I focused on in the packets. Maybe one or more of the protocol setup
packets was being directed to a wrong address but the MAC addresses were
correct in every packet. In successful cases the 17th packet was also from
the app to the server. So it would appear the server was waiting on
something from app that was never being sent.
> Is the answer to that
> question consistent with the answer to the first one (it might a deadlock
> between app and server caused by buffering, so that both end "think" its the
> other's turn to speak next) ?
I'm not sure what level of buffering you're referring to but I would think
it would have to be at the OS level since the entire process appears to be
locking at the first operation (a read) on the socket.
> If that doesn't suggest anything, and your DNS investigations don't turn up a
> hint, then I'm afraid I've run out of ideas.
Chris Uppal - 14 Nov 2005 10:44 GMT
> Man, I hope you're in Europe somewhere. The thought of getting up before
> 5:30 AM (central time USA) and actually having complex thought processes
> working is a scary one for me. Unless you're still on the night before.
<chuckle/>
Not to worry, I'm in the UK.
> What really bothers me about this is that it was
> locking up rather then failing clean. Unfortunately since it's working I
> doubt I'll have the time to followup on finding where the bug is. I
> included some more info below for thread completeness for anyone
> interested.
A shame not to nail it, but such are the pressures of commercial life...
-- chris