Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / First Aid / February 2005

Tip: Looking for answers? Try searching our database.

Seen java -server optimization vary 50%?

Thread view: 
Don Taylor - 26 Feb 2005 06:28 GMT
Hotspot 1.4.1-01 running as   java -server appname
running on a 2+ghz AMD part 512 meg of memory,
installed straight off th Knoppix 3.2/Debian CD,
no other users, no other activity, no networking,
just run this from the command line and do println().

Wrote some code that gets an expression from the user and then
beats itself to death doing bigInteger multiply, mod, gcd, etc.

I time and print this every hundred iterations.

The odd part, every hundred iterations of a run will take the
same time to better than 1/2% accuracy.  But if I kill it and
start it over the time can be substantially different, but
still each hundred is the same to better than 1/2%.  These
hundred iterations can take anywhere from ten to fifteen
minutes, depending on the run.  Not recompiling between the
runs, doing nothing but killing it and starting it over.
What I do now is run and kill a few times, until I get one
of the faster times, and then leave it running for days.

Ever seen anything like this?  The only thing I can imagine
is that the runtime optimizer is doing a better or worse job
for some unknown reason, at startup, and then not changing
it when the thousands of iterations follow.

Any idea how I might get it to choose the fast time every time?
Thanks
dar7yl - 27 Feb 2005 07:39 GMT
> Hotspot 1.4.1-01 running as   java -server appname
> running on a 2+ghz AMD part 512 meg of memory,
[quoted text clipped - 24 lines]
> Any idea how I might get it to choose the fast time every time?
> Thanks

This looks suspiciously like a garbage-collection problem.  The
garbage collector can run at any time, and sometimes can take
polynomial time to do it's dubious magic.

You can experiment with System.gc() to force garbage collection
at your convenience instead of letting it freewheel.  I have had
significant improvements in my app by forcing gc() at checkpoints,
especially when I am aware of considerable object creation and
deletion activity.

regards,
   Dar7yl
Don Taylor - 27 Feb 2005 18:09 GMT
...
>> I time and print this every hundred iterations.
>>
[quoted text clipped - 7 lines]
>> What I do now is run and kill a few times, until I get one
>> of the faster times, and then leave it running for days.

>This looks suspiciously like a garbage-collection problem.  The
>garbage collector can run at any time, and sometimes can take
>polynomial time to do it's dubious magic.

If it were garbage collection then I would tend to think that
the variation from one iteration of 100 cycles to the next 100
cycles would vary more.  I have tens of thousands of 100 cycle
iterations, with times varying from 596301 to 596335 milliseconds.
But if I kill this and restart it I can easily discover the next
tens of thousands of 100 cycle iterations will vary from
967208 to 967241 milliseconds.  Kill it again and find any other
number between 600 and 1000 seconds/100 iterations, repeatably.

That does not sound like the garbage collector to me.

>You can experiment with System.gc() to force garbage collection
>at your convenience instead of letting it freewheel.  I have had
>significant improvements in my app by forcing gc() at checkpoints,
>especially when I am aware of considerable object creation and
>deletion activity.

Do you still think this sounds like gc?  I'm skeptical.

thanks
Patricia Shanahan - 27 Feb 2005 12:33 GMT
> Hotspot 1.4.1-01 running as   java -server appname
> running on a 2+ghz AMD part 512 meg of memory, installed
[quoted text clipped - 27 lines]
> Any idea how I might get it to choose the fast time every
> time? Thanks

I have seen this sort of behavior in C and Fortran programs
whose most frequently executed loops do a lot of accesses to
data structures that were all allocated once, at the start
of the run.

Modern processors use a lot of indexes, accessed by virtual
or physical address, to control caching and address translation.

See
http://www.pcguide.com/ref/mbsys/cache/funcMapping-c.html
for an explanation of some important indexing approaches.
Unfortunately, fully associative indexing is VERY expensive,
so most indexing is direct mapped or set associative.

For both direct mapped and set associative with reasonably
small set size, combinations of addresses can lead to
replacement due to collisions, even when there is plenty of
space. Other combinations of addresses don't have those
collisions.

Patricia
Don Taylor - 27 Feb 2005 18:15 GMT
>> Hotspot 1.4.1-01 running as   java -server appname
>> running on a 2+ghz AMD part 512 meg of memory, installed
[quoted text clipped - 27 lines]
>> Any idea how I might get it to choose the fast time every
>> time? Thanks

>I have seen this sort of behavior in C and Fortran programs
>whose most frequently executed loops do a lot of accesses to
>data structures that were all allocated once, at the start
>of the run.

If the first 100 passes took substantially longer I would
not be at all surprised.  (This is a tiny little loop, a
couple of lines literally, that takes all the time but there
is setup before it gets to this)  But every 100 loops take
between 596301 and 596335 milliseconds, thousands and
thousands of times it takes this long.  But if I do nothing
but kill it and restart it I might find it takes between
967328 and 967365 milliseconds, thousands and thousands of
times it will take that long.

>Modern processors use a lot of indexes, accessed by virtual
>or physical address, to control caching and address translation.

>See
>http://www.pcguide.com/ref/mbsys/cache/funcMapping-c.html
>for an explanation of some important indexing approaches.
>Unfortunately, fully associative indexing is VERY expensive,
>so most indexing is direct mapped or set associative.

understood

>For both direct mapped and set associative with reasonably
>small set size, combinations of addresses can lead to
>replacement due to collisions, even when there is plenty of
>space. Other combinations of addresses don't have those
>collisions.

I'm still skeptical why tens of thousands of iterations can
all take the same time, and that time varies by more than 50%
if I happen to kill and restart the exact same binary with
the exact same (nearly zero) load on the machine.

Does it still sound like your explanation is plausible?
Why the huge variation from run to run and almost no variation
inside a run?

thanks
Patricia Shanahan - 27 Feb 2005 18:36 GMT
> Patricia Shanahan <patricia_shanahan@earthlink.net>
> writes:
[quoted text clipped - 77 lines]
>
> thanks

I know the explanation is plausible in general, because in
the case of the C and Fortran programs I had access to
special operating system hooks and a hardware logic analyzer
on the main bus. I could see exactly what was going on,
including the differences in mappings that caused the
differences in performance. They exhibited exactly the
behavior you describe. As the program loaded and allocated
its data structures the address mappings were established,
and that controlled the performance for the rest of that
run. Each new run rolled the memory mapping dice again, and
might get different inner loop performance.

To decide whether this explanation applies to your program,
you need to look at your most heavily executed loops. Are
most references to a fixed set of data structures, all
allocated in the first few iterations? If so, it is at least
a possible explanation. Conversely, if you are using data
structures that are created and garbage collected repeatedly
throughout the run, it is less likely to apply.

Paradoxically, this problem is more likely on a lightly
loaded machine, especially one that is dedicated to a single
application. The less contention there is for resources such
as physical memory, the more likely the program is to go on
running with physical addresses, as well as virtual
addresses, that were established early in the run.

Patricia
Don Taylor - 27 Feb 2005 21:05 GMT
>> ...
>> If the first 100 passes took substantially longer I would
[quoted text clipped - 18 lines]
>>
>> thanks

>To decide whether this explanation applies to your program,
>you need to look at your most heavily executed loops. Are
[quoted text clipped - 3 lines]
>structures that are created and garbage collected repeatedly
>throughout the run, it is less likely to apply.

This is the loop.  Everything, except 'd', is a bigInteger.
The only 'new' grabs the date/time, once every 10-15 minutes.

while(a.multiply(a).compareTo(n)<=0&&(T.compareTo(one)==0||T.compareTo(n)== 0)){
  b = a.modPow(w, n);
  T = n.gcd(b.modPow(n, n).subtract(b));
  if (a.mod(hundred).compareTo(zero) == 0) {
     d = new Date();
     finishtime = d.getTime();
     duration = finishtime-starttime;
     System.out.println("a = " + a.toString() + ", duration=" + duration);
     starttime = finishtime;
  }
  a = a.add(one);
}

>Paradoxically, this problem is more likely on a lightly
>loaded machine, especially one that is dedicated to a single
>application. The less contention there is for resources such
>as physical memory, the more likely the program is to go on
>running with physical addresses, as well as virtual
>addresses, that were established early in the run.

Is there any diagnostic information I could get to confirm
what the underlying behavior really is?  Or any guidelines
you might have to force the behavior to be towards the fast
execution, as opposed to the slow?

Thanks
Patricia Shanahan - 27 Feb 2005 21:34 GMT
> Patricia Shanahan <patricia_shanahan@earthlink.net>
> writes:
...
>> To decide whether this explanation applies to your
>> program, you need to look at your most heavily executed
[quoted text clipped - 21 lines]
>    a = a.add(one);
> }

I don't know, because the BigInteger arithmetic could
contain a lot of hidden object creation. For example, if
neither of the gcd operands is zero, gdc will create two new
MutableBigInteger instances, do a gcd using them, and return
a new BigInteger, at least three object creations.

>> Paradoxically, this problem is more likely on a lightly
>>  loaded machine, especially one that is dedicated to a
[quoted text clipped - 10 lines]
>
> Thanks

Diagnosis depends on the performance tools you have
available for your operating system. Do you have access to
e.g. cache miss counts for any caches? Look for consistent
differences between good and bad runs.

If you have hardware specs, look for caches and translation
tables with low associativity. Direct mapped is the most
suspect, but I've seen it happen with two way set associative.

My main approach was to talk to the operating system
developers about their memory allocation strategies.
However, there is one slightly insane user-implementable
technique that has sometimes helped. Before you try to run
your program, deliberately overload the memory. Run a bunch
of memory hog jobs whose total memory exceeds the size of
the machine and let them fight over the memory. That had the
effect of randomizing some mapping structures. Of course,
make sure you clean them all out before trying your program.

Otherwise, if it is memory mapping, your current technique
of seeing whether you have a good or bad mapping and
restarting until you get an empirically good mapping is
about the best that can be done.

Patricia


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.