Java Forum / General / January 2007
Speed of interfaces vs inheritance
Chris - 30 Dec 2006 19:07 GMT In a particularly time-critical part of my app I need to call a method on an external class in a tight loop. The class can be different in different contexts. I've got two choices: create an interface and have the various different classes implement it, or create an abstract class and have the various different classes extend it.
I seem to recall, in years past, that the interface approach was slower. Is this still the case?
(In this particular situation, either approach works fine, though I personally like the interface approach better).
Mike Schilling - 30 Dec 2006 19:20 GMT > In a particularly time-critical part of my app I need to call a method on > an external class in a tight loop. The class can be different in different [quoted text clipped - 7 lines] > (In this particular situation, either approach works fine, though I > personally like the interface approach better). The only way to answer this question is to write a test.
Tom Hawtin - 30 Dec 2006 20:22 GMT >> In a particularly time-critical part of my app I need to call a method on >> an external class in a tight loop. The class can be different in different [quoted text clipped - 9 lines] > > The only way to answer this question is to write a test. Yeah, but write a good one...
FWIW, a couple years ago I tested dynamically loading a class through a class loader and having a loop repeatedly calling an (interface) method which threw an exception. Under the Sun server HotSpot, the loop was eliminated. Client performance wasn't too bad either.
I believe interface methods can get slow where the same runtime class is called through a number of different interfaces. A highly unlikely situations, particularly if you were considering an abstract base class.
Tom Hawtin
Arne Vajhøj - 30 Dec 2006 19:45 GMT > In a particularly time-critical part of my app I need to call a method > on an external class in a tight loop. The class can be different in [quoted text clipped - 4 lines] > I seem to recall, in years past, that the interface approach was slower. > Is this still the case? I guess that would be implementation specific.
I would expect them to be relative close in performance due the similarity of what has to be done.
And if the method is actually doing some work the call overhead should be minimal compared to what is happening inside the method.
A quick test with some randomly created code on what happen to be my default Java (SUN Java 1.5 Win32) interface is actually faster than abstract base class.
Arne
Daniel Pitts - 31 Dec 2006 01:05 GMT > In a particularly time-critical part of my app I need to call a method > on an external class in a tight loop. The class can be different in [quoted text clipped - 7 lines] > (In this particular situation, either approach works fine, though I > personally like the interface approach better). The overhead of method calls of any sort or considered trivial in most circumstances. I would suggest using the approach that gives you the best OO design, and then changing it around if its not fast enough. Determining if its not fast enough should be down by using a profiler, not a guess and check method.
Chris Uppal - 04 Jan 2007 19:13 GMT > The overhead of method calls of any sort or considered trivial in most > circumstances. Agreed, but not in /all/ circumstances. The OP has indicated that he suspects that he has a candidate for one of the cases where it is not.
> I would suggest using the approach that gives you the > best OO design, and then changing it around if its not fast enough. Agreed.
> Determining if its not fast enough should be down by using a profiler, > not a guess and check method. Not a profiler, unless you (or he) has access to a /far/ better profiler than I have ever encountered -- the overhead of a method call is too low for normal profilers even to try to measure it. You /can/ use a profiler to measure the execution time of both forms of the calling method, but that's just a awkward and tricky way to measure the overall execution time (of the method), and it'd be much easier to measure it directly without the (near certain) risk that the profiler is affecting the results.
Profilers are very useful for identifiying coarse-grained hot-spots, but are not (in my experience) any use at all in identifying fine-grained issues.
-- chris
John Ersatznom - 05 Jan 2007 21:43 GMT > Profilers are very useful for identifiying coarse-grained hot-spots, but are > not (in my experience) any use at all in identifying fine-grained issues. What is useful for that? (I'd have thought running suspected hotspots by themselves in a loop with a huge number of iterations, while profiling. :))
Chris Uppal - 06 Jan 2007 15:44 GMT [me:]
> > Profilers are very useful for identifiying coarse-grained hot-spots, > > but are not (in my experience) any use at all in identifying [quoted text clipped - 3 lines] > themselves in a loop with a huge number of iterations, while profiling. > :)) Assuming you don't want to go the analytic route (and who does ?), the only option I know of is to run huge loops (with no profiler messing up the execution pattern). Note that in any application where such issues matter, there is a natural source of such loops -- the application itself.
Micro-"benchmarks" like the one I posted are useful for developing a feeling for the numbers involved, and for identifying the kinds of situations where fine-grained issues arise (and hence developing a decent intuition about /when/ it's worth doing real benchmarking); but they are no substitute for measuring the target code itself.
-- chris
andrewmcdonagh - 01 Jan 2007 12:57 GMT > In a particularly time-critical part of my app I need to call a method > on an external class in a tight loop. The class can be different in [quoted text clipped - 7 lines] > (In this particular situation, either approach works fine, though I > personally like the interface approach better). The other advice about testing instead of guessing is the most important one to follow .
Aside from that, if your design choice of ABC or Interface causes such a performance problem, then maybe there's another design approach.
Tell Don't Ask....
Instead of your code doing the loop and worrying about performance, Tell the two other classes to do the loop for you.
so, instead of
class Client {
public void run() { HttpServer h = new HttpServer();
for (int i = 0; i < 65000; i++ { h.doSomething(); }
}
we have....
class Client {
public void run() { HttpServer h = new HttpServer();
h.doSomething(); // for loop logic moved into HttpServer } }
Andrew
John Ersatznom - 04 Jan 2007 16:40 GMT > so, instead of > [quoted text clipped - 19 lines] > } > } Shouldn't that be h.doSomething(65000)? Unless even the length of the loop should be up to the HttpServer class and not policy set by the caller... :)
Of course, it should really be h.doSomething(SOME_FUNKY_CONSTANT_NAME); ...
As for call resolution speed, given that any way you slice it it's a singly-indirected virtual function call under the hood it should be the same speed as any other polymorphic call, and almost the same speed as a C++ one with a decently-performing JIT-capable VM. If Java had multiply-dynamic binding OOTB, then polymorphic methods would get slower proportionately to the number of dynamically-bound-to arguments, but Java doesn't (although it actually does *almost* have closures, if you know where to look)...
Chris Uppal - 04 Jan 2007 19:13 GMT > I seem to recall, in years past, that the interface approach was slower. > Is this still the case? As others have said, it depends, and (if it matters that much to you) you'll have to measure it.
I got interested enough to try a few experiments (these were all with jdk 1.6, but 1.5 gives very similar results).
For a start, comparing a method like: public void method() { ++datum; } where "datum" is an instance field. Just to get a ballpark figure, calling that via an interface using the client JVM takes about 12 nanoseconds if called as an override of an abstract method, vs about 15 if the same code is the implementation of an interface. That is /not/ the bottom-line, there are many highly important effects to consider, but even from that approximate data we can see that the difference between the two kinds of method invocation is /very/ small. If you are going to invoke the "same" method(s) a billion or two times then using abstract methods will save you a handful of seconds overall. More importantly, the time is so small, that it's difficult (but not impossible) to imagine a case where the body of the method -- even a very trivial method, as above -- wouldn't dominate the execution time. At this very fine-grained level, the number of parameters will also effect the invocation overhead -- in this case the method has one parameter (this), in other cases it might have more or less than that.
However, if you are /that/ bothered by performance, you are not going to be using the client JVM in the first place. Switching to the server JVM (in this specific test) reduces both times, to about 9 vs 12 nanoseconds -- so the relative difference is about the same.
Another thing to note is the degree of actual polymorphism at the call site. In my test code (given in full at the end of this post), the naively expected polymorphism is 3 (there are three different -- albeit identical -- implementations of the abstract and interface methods). What's more, each of these possibilities occurs equally often. Considerations of polymorphism affect how the JITed code works. A typical implementation technique (and one used by Sun's VMs) is to attempt to compile a virtual or interface method call into a sequence of test-and-branch instructions (thus avoiding a slow vtable indirection). So the number of possible candidate classes which have to be tested for at each invocation will affect both the strategy for generating the code, and its actual execution profile. So, what happens if a given call-site is not, in fact, heavily polymorphic ? To try that out I commented out the last two lines of the time() function (below), so that the calls were actually monomorphic. With the client JVM that completely eliminated the difference between the two kinds of method invocation (about 12 nanos in either case). However, with the server JVM, once it had realised that there was no real polymorphism at the call site (in either case) it was able to inline the method calls -- but that further permitted it to optimise the loops, and in fact allowed it to remove such a large chunk of the code (not all) that the test was no longer useful.
But that kind of optimisation makes considering call sites difficult. In the case of my test code there are two call sites (in timeInterfaceCall() and timeAbstractCall()) which /may/ be polymorphic. But there is no guarantee that the optimisation will leave those alone. If the optimiser had attempted to "split" either of those methods (effectively pushing the test-and-branch outside the timing loop) then it would have been able to inline both cases completely, even in the polymorphic case (there would be three largely identical copies of each of the two timing methods). I don't know whether Sun's current generation of JVMs is capable of splitting, but the technique is moderately well-known, and is implemented in the Animorphic SELF VM on which a lot of the runtime optimisation in Hotspot is based.
Another point to consider (and it's a flaw in my test) is that conditional branches directly affect execution time on processors with pipelining and branch prediction. In my test the CPUs branch prediction would be been almost perfect, but that might not be the case in a real application.
Anyway, I think the bottom line is more like. The overhead is -- at worst -- very, very, small (but measurable); it is also subject to very active (and often successful) attempts by the JIT to optimise it out.
Sample code follows. Note that it runs in an infinite loop. It takes the JITer a while to complete its optimisations -- on my machine the JIT generated several versions of the timing loops as it put more and more work into optimising them, it seemed to have settled down after about 20 seconds.
-- chris
======== code ======== interface Interface { void interfaceCall(); }
abstract class Abstract { abstract public void abstractCall(); }
class Concrete1 extends Abstract implements Interface { public long datum; public void interfaceCall() { ++datum; } public void abstractCall() { ++datum; } }
class Concrete2 extends Abstract implements Interface { public long datum; public void interfaceCall() { ++datum; } public void abstractCall() { ++datum; } }
class Concrete3 extends Abstract implements Interface { public long datum; public void interfaceCall() { ++datum; } public void abstractCall() { ++datum; } }
public class Test { private static final int LOOPS_PER = 50 * 1000 * 1000;
// leave this running for a while (20 seconds minimum to allow the // JITer time to finish optimising public static void main(String[] args) { Concrete1 c1 = new Concrete1(); Concrete2 c2 = new Concrete2(); Concrete3 c3 = new Concrete3();
while (c1.datum + c2.datum + c3.datum >= 0) time(c1, c2, c3); }
private static void time(Concrete1 c1, Concrete2 c2, Concrete3 c3) { // comment out some of these lines to reduce/eliminate // polymorphism at the call sites in timeAbstractCall() // and timeInterfaceCall() timeAbstractCall(c1); timeInterfaceCall(c1); timeAbstractCall(c2); timeInterfaceCall(c2); timeAbstractCall(c3); timeInterfaceCall(c3); }
private static void timeAbstractCall(Abstract it) { long start = System.currentTimeMillis(); for (int i = 0; i < LOOPS_PER; i++) it.abstractCall(); long end = System.currentTimeMillis(); report(end-start, " abstract", LOOPS_PER); }
private static void timeInterfaceCall(Interface it) { long start = System.currentTimeMillis(); for (int i = 0; i < LOOPS_PER; i++) it.interfaceCall(); long end = System.currentTimeMillis(); report(end-start, "interface", LOOPS_PER); }
private static void report(long millis, String type, int calls) { double nanosPerCall = millis * 1.0e6 / calls; System.out.println(type + ": " + nanosPerCall + " nanosec/call"); } } =====================
Tom Hawtin - 04 Jan 2007 19:51 GMT > very, very, small (but measurable); it is also subject to very active (and > often successful) attempts by the JIT to optimise it out. With the Sun server VM, your code shows an awful lot of 'optimiser' activity. Running your test unaltered (on 1.6, Server VM, single core Opteron 128), my *first* result is: abstract: 0.96 nanosec/call, interface: 1.08 nanosec/call. It fluctuates wildly, but always slower, from there.
The performance appears to be stumbling because although the implementations are all called as often as one another, that is not true if you look at a sample of the order of, say, 10,000 iterations. I'm sure if you switched the obscure VM options to see the compile times, they would be high.
If you call all the implementations at once: o The results are consistent. o The server VM times practically disappear. o The client VM times are the same for interface and abstract class.
Tom Hawtin
John Ersatznom - 06 Jan 2007 05:14 GMT > The performance appears to be stumbling because although the > implementations are all called as often as one another, that is not true [quoted text clipped - 6 lines] > o The server VM times practically disappear. > o The client VM times are the same for interface and abstract class. It sounds from this like the JIT compiler can actually DEgrade performance for pathological patterns of polymorphic calls. In particular, it looks like threaded code with different threads doing the same thing (via the same class) with different implementations of something else (via polymorphic object parameters to that class) might choke. Seems you get rock-solid performance if all implementations that occur very often occur at similar frequencies and the rest are rare, e.g. all equally common or one overwhelmingly dominant one, but if there's a broader frequency spectrum you might be SOL.
Tom Hawtin - 06 Jan 2007 13:19 GMT > It sounds from this like the JIT compiler can actually DEgrade > performance for pathological patterns of polymorphic calls. In That's a general property of optimisations. Whenever you optimise something, expect performance in certain conditions to be pessimised.
> particular, it looks like threaded code with different threads doing the > same thing (via the same class) with different implementations of [quoted text clipped - 3 lines] > e.g. all equally common or one overwhelmingly dominant one, but if > there's a broader frequency spectrum you might be SOL. It's okay different implementations being invoked with different frequencies, just so long as you don't get batches into the tens of thousands in one and then another.
Tom Hawtin
Chris Uppal - 06 Jan 2007 15:46 GMT > It sounds from this like the JIT compiler can actually DEgrade > performance for pathological patterns of polymorphic calls. That's always possible, since the JIT itself is not free. If it decides to do some work, then it is gambling that the (unknown) future execution patterns will be such that the extra work now turns out to be of nett value in the long run -- but there's no guarantee that the gamble will pay off.
-- chris
Chris Uppal - 06 Jan 2007 15:46 GMT [me:]
> > very, very, small (but measurable); it is also subject to very active > > (and often successful) attempts by the JIT to optimise it out. [quoted text clipped - 4 lines] > interface: 1.08 nanosec/call. It fluctuates wildly, but always slower, > from there. That's very odd. I'm running on a 1.5 GHz Pentium "M" (which I would have thought was similar enough that the JITing would work very similarly if not identically), and see nothing like that pattern. At the start the times vary wildly, but after 20 seconds or so they settle down to a neat (but not perfect) pattern showing the numbers I mentioned. (The monomorphic case with -server is different, as I mentioned before).
Do you know a better way to tell what the optimiser is doing than using the JVM TI interfaces ? That is what I was using to monitor the JIT (in separate runs from the ones where I quoted times, of course), and it showed up to four versions of the timing loops being generated over the first up to ten seconds with no further reported activity thereafter (over several minutes).
> The performance appears to be stumbling because although the > implementations are all called as often as one another, that is not true > if you look at a sample of the order of, say, 10,000 iterations. Continually repeated re-JITing doesn't sound at all like what I was seeing.
> If you call all the implementations at once: > o The results are consistent. > o The server VM times practically disappear. > o The client VM times are the same for interface and abstract class. I know of no good way to "call all the implementations at once" and still be able to measure anything using the relatively crude tools available to me. I suppose I could reduce the number of loops to, say, 1000, and then collect and average /lots/ of such micro-runs, is that what you tried ?
-- chris
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|