Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / May 2006

Tip: Looking for answers? Try searching our database.

strange performance behavior of a mathematical method: why?

Thread view: 
Dimitri Ognibene - 23 Apr 2006 21:17 GMT
I've found a difference of about 10% in the execution of 2 version of
this method:

public double[] evaluate(double input[]){
       double a;
       //System.arraycopy(input,0,activation[0],0,input.length);
       activation[0]=input;
       //for (int i=0;i<input.length;i++)
       //  activation[0][i]=input[i];
       for (int i=1;i<layers;i++){
           double activation_col[]=activation[i-1];
           double activation_col_res[]=activation[i];
           double weight_matr[][]=weight[i-1];
           for (int j =0; j< activation_col_res.length;j++ ){
               double weight_col[]=weight_matr[j];
               double acc=0;
               for (int k=0; k<activation_col.length;k++){
                  *************variant go here*************
               }
               activation_col_res[j]=g(acc);
           }
       }
       setChanged();
       notifyObservers();

       return activation[layers-1];

   }

variant 1:
                    a= activation_col[k];
                   a*=weight_col[k];
                   acc+=a;
variant 2:
                   acc+= activation_col[k]*weight_col[k];

variant1 is 10% faster than variant2.
I've a matrix of about 1200x400 elements (weight matrix)
variant1 avarage is 4.82ms
variant2 avarage is 5.24ms

does this performance difference makes any sense?

does someone has any tips to write always the faster code?

thanks
Dimitri
Diomidis Spinellis - 23 Apr 2006 22:26 GMT
> I've found a difference of about 10% in the execution of 2 version of
> this method:
[quoted text clipped - 38 lines]
>
> does this performance difference makes any sense?

If you are curious look at the generated Java bytecodes (use javap).
But the difference isn't important.  Another compiler, jit or processor
architecture could give you different results.

> does someone has any tips to write always the faster code?

- Measure before optimizing
- Focus on algorithmic improvements

Signature

Diomidis Spinellis
Code Quality: The Open Source Perspective (Addison-Wesley 2006)
http://www.spinellis.gr/codequality?cljp

Dimitri Ognibene - 23 Apr 2006 22:34 GMT
what do you tjink of this method implementation?
any tip?
must i pass to c++ & asm to get better performance?
note that the g method called inside the loop is final and the compiler
does inline it, or it looks to be so from the profiling data.
Diomidis Spinellis - 24 Apr 2006 08:51 GMT
> what do you tjink of this method implementation?
> any tip?
> must i pass to c++ & asm to get better performance?
> note that the g method called inside the loop is final and the compiler
> does inline it, or it looks to be so from the profiling data.

From what I can see this particular code manipulates floating point
numbers.  This is one of the (not common) cases where a good Java
compiler and JIT-based runtime system should be able to give you the
same performance as the one you would achieve with (say) C.

You are unlikely to gain anything from moving to assembly, unless you
can utilize instructions that compilers don't typically support.  For
example, I see that your code calculates some type of dot product: if
your data allows it, you could gain by using Intel's SSE SIMD extensions
or AMD's 3DNow.  Another possibility would be to move your code to be
executed by a 3D graphic card's hardware.

Signature

Diomidis Spinellis
Code Quality: The Open Source Perspective (Addison-Wesley 2006)
http://www.spinellis.gr/codequality?cljp

Dimitri Ognibene - 24 Apr 2006 09:04 GMT
thanks diobidis,
do you have any link where i can found such an approach?
thanks
Diomidis Spinellis - 24 Apr 2006 09:21 GMT
> thanks diobidis,
> do you have any link where i can found such an approach?

http://www.ics.forth.gr/eHealth/publications/papers/2005/PCI2005.pdf

Signature

Diomidis Spinellis
Code Quality: The Open Source Perspective (Addison-Wesley 2006)
http://www.spinellis.gr/codequality?cljp

Dimitri Ognibene - 04 May 2006 17:21 GMT
this paper is very interesting, like other resources on gpgpu.org,
but do you, or any one else, know of an interface for java like brook
for c++ to use the gpu as a vector coprocessor?
Thanks
Dimitri
Roedy Green - 24 Apr 2006 04:32 GMT
On 23 Apr 2006 13:17:08 -0700, "Dimitri Ognibene"
<dimitri.ognibene@gmail.com> wrote, quoted or indirectly quoted
someone who said :

>variant 1:
>                     a= activation_col[k];
[quoted text clipped - 4 lines]
>
>variant1 is 10% faster than variant2.

that is a puzzle. First have a look at the byte codes generated.
see http://mindprod.com/jgloss/disassembler.html
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Roedy Green - 24 Apr 2006 04:36 GMT
On 23 Apr 2006 13:17:08 -0700, "Dimitri Ognibene"
<dimitri.ognibene@gmail.com> wrote, quoted or indirectly quoted
someone who said :

>does this performance difference makes any sense?
>
>does someone has any tips to write always the faster code?

You try variants and measure. Use an AOT compiler or java -server
which optimises harder, but takes longer to come up to speed.

see http://mindprod.com/jgloss/aot.html
http://mindprod.com/jgloss/javaexe.html
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Chris Uppal - 24 Apr 2006 09:34 GMT
> I've found a difference of about 10% in the execution of 2 version of
> this method:
[...]
> variant 1:
>                      a= activation_col[k];
[quoted text clipped - 4 lines]
>
> variant1 is 10% faster than variant2.

It's very difficult to estimate exactly what will have marginal effects at this
level of optimisation.  The JIT is pretty clever, the CPU does lots of
optimisations of its own, cache effects and alignment effects combine to
confound estimation too.  I have no idea what the explanation might be here; it
could be an effect of the (JITed form of the) first expression being able to
make better use of multiple arithmetic units within the CPU, but that's nothing
more than a guess.

You don't mention whether you are using the -client or -server JVM, so I'm
guessing that you aren't aware of how that choice affects the kind of
optimisations that the JIT will do.  Rule of thumb: -server is a good deal
better at optimising arithmetic code.  FWIW, on the only micro-benchmark I've
run recently (and one micro-benchmark means almost nothing on its own), the
server JVM was doing essentially the same optimisations as the best C++
compiler I have access to (the one in MS VS.2003).

Similarly, you don't mention what CPU you are running on, so I'm guessing that
you are not aware of how details of chip architecture affect how fast a
specific expression of an algorithm can run.  Agner Fog has a fascinating (but
long) guide to optimisation for Pentium family processors on this page:
   http://www.agner.org/assem/
which also has some links.  If nothing else, it should put you off the idea of
re-implementing in assembly ;-)

   -- chris
Dimitri Ognibene - 24 Apr 2006 11:06 GMT
thanks very much Chris,
however i don't want to rewrite it in assembly but build an jni
interface to Math kernel libraries...
but i'll look at the links that you and diomidis suggest, and after
some trial i'll decide
thanks agani
dimitri
Dimitri Ognibene - 24 Apr 2006 13:40 GMT
ok you were right with the -server option the two version are about 2
time faster and have the same speed 2.70 ms avarage
thanks
Robert Klemme - 24 Apr 2006 12:33 GMT
> variant 1:
>                      a= activation_col[k];
[quoted text clipped - 9 lines]
>
> does this performance difference makes any sense?

How exactly did you measure performance?  Did you give the JVM some warm
up runs before doing the real measurement?

Cheers

    robert
Dimitri Ognibene - 24 Apr 2006 12:44 GMT
yes i do,
i've tried to wait a little and this are the "steady" value.
however i was not using the -server option.. i'll post the result
later
thanks dimtiri
Robert Klemme - 24 Apr 2006 15:37 GMT
> yes i do,
> i've tried to wait a little and this are the "steady" value.

What exactly do you mean by "wait a little"?  In order for the JVM's
optimization to kick in you have to actually execute the code several
times before you start measuring.  Also you should execute every variant
several times to get better measurement accuracy.

    robert
Dimitri Ognibene - 24 Apr 2006 15:51 GMT
yes,
i was in steady state...
Rob@Bedford - 24 Apr 2006 17:44 GMT
When I was working on some code optimization I read somewhere that
copying global variables to local variables does increase performance.
Don't ask me how/why, but its a behind the scenes thing.  I will try to
find the article and post link.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.