Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / CORBA / February 2008

Tip: Looking for answers? Try searching our database.

My OrbixWeb (OW) server has intermittent startup failures from     impl_is_ready

Thread view: 
apm35@student.open.ac.uk - 01 Feb 2008 08:16 GMT
Hello,

I am struggling with some horrible intermittent OrbixWeb problems. The
version of OW is 3.0.1, and ancient, buggy, unsupported version. In
the fullness of time we will move off CORBA completely and use JMS but
for now there is a problem and changing the version of OW or even the
ORB is not an option.

Every now and then our server fails to startup. The error is an
exception from impl_is_ready. It is shown below. To make the problem
go away we just keep restarting things until it works. We dont usually
have to try many times before it works but of course, we should not
have to do this at all.

Two things have changed since the env in which things work. One is
that I have rewritten the server from ancient C++ to Java so it is now
using OW rather than Orbix. We still have another C++ server so we are
using the orbixd deamon instead of the java one. We just rig the
startup script to fire up the JVM with the parameters that Orbix needs
for auto-launch to work (undocumented, sadly, deep sigh). The other
change is we now have 3 servers where we used to have just one. One of
the servers retains its original NS name, the others have either '2'
or '3' appended to the NS name. This provides a crude form of load
balancing. When we start the 3 servers most of time there is no
problem, but every now and then one of them fails with the error. Last
night we were really unlucky and two of them failed with this error.
We have never had all 3 fail at the same time. Yet.

I wonder if anyone can shed any light on what this exception means. My
guess is that it might be OW-specific.

Here is the stack trace:

31-Jan-2008 20:38:50,025 ERROR [main]  RouterServiceServerOrbix:
org.omg.CORBA.COMM_FAILURE: Communication failure     select error
   Reason: (unknown)
31-Jan-2008 20:38:50,027 ERROR [main] STDERR:
org.omg.CORBA.COMM_FAILURE: Communication failure    select error
Reason: (unknown)
31-Jan-2008 20:38:50,028 ERROR [main] STDERR:   at
IE.Iona.OrbixWeb.CORBA.ExceptionHelper.new_COMM_FAILURE(ExceptionHelper.java:
218)
31-Jan-2008 20:38:50,028 ERROR [main] STDERR:   at
IE.Iona.OrbixWeb.CORBA.Listener.Bind(Listener.java:135)
31-Jan-2008 20:38:50,029 ERROR [main] STDERR:   at
IE.Iona.OrbixWeb.CORBA.EventHandler.newListener(EventHandler.java:130)
31-Jan-2008 20:38:50,029 ERROR [main] STDERR:   at
IE.Iona.OrbixWeb.CORBA.EventHandler.<init>(EventHandler.java:108)
31-Jan-2008 20:38:50,030 ERROR [main] STDERR:   at
IE.Iona.OrbixWeb.CORBA.BOAImpl.initEventHandler(BOAImpl.java:351)
31-Jan-2008 20:38:50,032 ERROR [main] STDERR:   at
IE.Iona.OrbixWeb.CORBA.BOAImpl.readyEventHandler(BOAImpl.java:376)
31-Jan-2008 20:38:50,032 ERROR [main] STDERR:   at
IE.Iona.OrbixWeb.CORBA.BOAImpl.processEvents(BOAImpl.java:453)
31-Jan-2008 20:38:50,033 ERROR [main] STDERR:   at
IE.Iona.OrbixWeb.CORBA.BOAImpl.impl_is_ready(BOAImpl.java:796)
31-Jan-2008 20:38:50,033 ERROR [main] STDERR:   at
IE.Iona.OrbixWeb.CORBA.ORB.impl_is_ready(ORB.java:1034)

Regards,

Andrew Marlow
Yakov Gerlovin - 05 Feb 2008 20:34 GMT
Hi,

  First of all, I hope your solved the CORBA.INV_OBJREF exception
problem.
Now, concerning your question, if you're on Solaris, may be using
'truss' can help your to find the problem. BTW, it is usually a good
idea to specify the platform you're working on (even if you're writing
in Java).

When you say 'When we start the 3 servers', do you mean you start them
after the reboot or you're starting them after those servers were
shuted down (restarted)? Are your servers configured to run always on
the same port? Is it possible other processes are using those ports?

Regards,
Yakov
apm35@student.open.ac.uk - 06 Feb 2008 08:25 GMT
> Hi,
>
>    First of all, I hope your solved the CORBA.INV_OBJREF exception
> problem.

First of all, many thanks for trying to help :-)

> Now, concerning your question, if you're on Solaris, may be using
> 'truss' can help your to find the problem.

Unfortunately not. I already have the stack trace. Besides, when using
truss on event driven programs it will typically just show the
routines that drive it round the event loop. For CORBA programs that
will be the socket read and select calls.

> BTW, it is usually a good
> idea to specify the platform you're working on (even if you're writing
> in Java).

Solaris 8.

> When you say 'When we start the 3 servers', do you mean you start them
> after the reboot or you're starting them after those servers were
> shuted down (restarted)?

There is no reboot. The machine stays up the whle time. I should have
been more specific. When I said "restart tings" that was a bit vague.
What I meant was running a script that finds the server pids (using
ps) then issues a kill -9. Then I run the script that starts the
servers again.

> Are your servers configured to run always on
> the same port?

There is no port configuration as such. This is CORBA, where ports are
an implementation detail buried in the IOR. I have made no changes in
this area so OW will choose whatever ports it sees fit. There is a
particular port number used to communicate with the orbix daemon, it
is port 1570, er, I think. Anyhow this also is the default and if
there was a problem with that then the Orbix daemon would not start.

> Is it possible other processes are using those ports?

No.

> Regards,
> Yakov
Yakov Gerlovin - 06 Feb 2008 11:11 GMT
OK, I had a very similar problem with Solaris 8 a couple of years ago.
When your process does not gracefully close the socket the port it
listens on remains used for some time ( this time is configurable).
Any attepts to bind to the same port will fail (until the OS frees
it).

kill -9 is unmaskable, while regular kill can be caught in your
server. Implement signal handlrer ( if it is possible in Java ) that
just shuts down the ORB.

The more clean solution is to add 'shutdown' method to some root
object (or write a small servant with only this method). It would be
easier if the object with shutdown method will be persistent and
registered under POA, configured with direct persistency (check the
documentation whether OrbixWeb supports direct persistency). This way
the object reference will not change between restarts.

> This is CORBA, where ports are an implementation detail buried in the IOR.
Indeed, port allocation is implementation specific. However many
implementations allows some level of control over it.
marlow.andrew@googlemail.com - 07 Feb 2008 20:50 GMT
> OK, I had a very similar problem with Solaris 8 a couple of years ago.
[snip]
> The more clean solution is to add 'shutdown' method to some root
> object

There is another approach. Rollback all the java work and revert to
the original C++ servers. This is what is going to be done. The
reasoning is that the introduction of java on the server side appears
to coincide with a greater number of Orbix connectivity problems.
Removing java from the equation and seeing if these connectivity
incidence decrease is the only way to be sure whether or not the
incidents are connected with the presence of java in the system.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.