Java Forum / General / December 2007
Symbolic benchmark
Jon Harrop - 10 Dec 2007 09:38 GMT Following recent discussions here about benchmarking and when Java's performance is worst, I proposed the idea of symbolic computation because this entails very high allocation rates for short-lived objects when written in an idiomatic functional style and Java should be uniquely bad at this. While you might be able to work around the functional style using imperative constructs, it will be extremely tedious and error-prone for such applications.
So, perhaps someone would like to translate this simple symbolic simplifier into Java:
http://www.lambdassociates.org/studies/study10.htm
Here is the OCaml implementation:
let rec ( +: ) f g = match f, g with | `Int n, `Int m -> `Int (n +/ m) | `Int (Int 0), e | e, `Int (Int 0) -> e | f, `Add(g, h) -> f +: g +: h | f, g -> `Add(f, g)
let rec ( *: ) f g = match f, g with | `Int n, `Int m -> `Int (n */ m) | `Int (Int 0), e | e, `Int (Int 0) -> `Int (Int 0) | `Int (Int 1), e | e, `Int (Int 1) -> e | f, `Mul(g, h) -> f *: g *: h | f, g -> `Mul(f, g)
let rec simplify = function | `Int _ | `Var _ as f -> f | `Add (f, g) -> simplify f +: simplify g | `Mul (f, g) -> simplify f *: simplify g
OCaml is not terribly efficient at this benchmark but I believe it will be many times faster than any Java solution.
The program simply traverses a symbolic expression, reconstructing it whilst applying some simple rewrite rules to simplify it. For example, "(a+0)*1*b" becomes "a*b". The applications are obvious: parsers, compilers, interpreters, scientific computing and computer algebra.
Note that the symbolic expression is immutable: you must not destroy your input! Also, numbers are all arbitrary precision integers.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Lew - 10 Dec 2007 15:27 GMT > Following recent discussions here about benchmarking and when Java's > performance is worst, I proposed the idea of symbolic computation because > this entails very high allocation rates for short-lived objects when > written in an idiomatic functional style and Java should be uniquely bad at > this. Java is very good at high allocation rates for short-lived objects; it's optimized for it. Don't believe Jon's rants, people.
 Signature Lew
Jon Harrop - 10 Dec 2007 22:32 GMT >> Following recent discussions here about benchmarking and when Java's >> performance is worst, I proposed the idea of symbolic computation because [quoted text clipped - 4 lines] > Java is very good at high allocation rates for short-lived objects; it's > optimized for it. Don't believe Jon's rants, people. Prove it by implementing that benchmark in Java.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Daniel Pitts - 10 Dec 2007 23:48 GMT >>> Following recent discussions here about benchmarking and when Java's >>> performance is worst, I proposed the idea of symbolic computation because [quoted text clipped - 5 lines] > > Prove it by implementing that benchmark in Java. Actually, why don't you prove it (or disprove it). I'm sure most of the Java advocates here have better things to do than chase your wild goose.
 Signature Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
Jon Harrop - 11 Dec 2007 00:43 GMT >>>> Following recent discussions here about benchmarking and when Java's >>>> performance is worst, I proposed the idea of symbolic computation [quoted text clipped - 8 lines] > Actually, why don't you prove it (or disprove it). I'm sure most of the > Java advocates here have better things to do than chase your wild goose. Because you would only blame Java's poor performance on me rather than Java.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Lew - 11 Dec 2007 02:28 GMT >>>>> Following recent discussions here about benchmarking and when Java's >>>>> performance is worst, I proposed the idea of symbolic computation [quoted text clipped - 8 lines] > > Because you would only blame Java's poor performance on me rather than Java. Oh, so when it's time to sh.t or get off the pot, you just attack the person and (likely inaccurately) predict bad behavior on their part to excuse your lack of intellectual honesty, Mr. Cambridge Don.
 Signature Lew
Jon Harrop - 11 Dec 2007 10:03 GMT > Oh, so when it's time to sh.t or get off the pot, you just attack the > person and (likely inaccurately) predict bad behavior on their part to > excuse your lack of intellectual honesty, Mr. Cambridge Don. Sounds like you can't write an efficient Java implementation of this benchmark either.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Lew - 11 Dec 2007 14:32 GMT >> Oh, so when it's time to sh.t or get off the pot, you just attack the >> person and (likely inaccurately) predict bad behavior on their part to >> excuse your lack of intellectual honesty, Mr. Cambridge Don. > > Sounds like you can't write an efficient Java implementation of this > benchmark either. Yada, yada, Mr. Cambridge Almighty Don.
 Signature Lew
Daniel Pitts - 11 Dec 2007 18:37 GMT >> Oh, so when it's time to sh.t or get off the pot, you just attack the >> person and (likely inaccurately) predict bad behavior on their part to >> excuse your lack of intellectual honesty, Mr. Cambridge Don. > > Sounds like you can't write an efficient Java implementation of this > benchmark either. What's with the personal attacks here? No one accused you of being unable.
I personally said I was unwilling to at this time.
 Signature Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
Steve Wampler - 11 Dec 2007 00:18 GMT >>> Following recent discussions here about benchmarking and when Java's >>> performance is worst, I proposed the idea of symbolic computation because [quoted text clipped - 5 lines] > > Prove it by implementing that benchmark in Java. Since we've been talking about C/C++ and Java, got a version in C or C++? :)
 Signature Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
Jon Harrop - 11 Dec 2007 00:46 GMT >> Prove it by implementing that benchmark in Java. > > Since we've been talking about C/C++ and Java, got a version in C or C++? > :) I posted something similar here (for derivatives rather than simplification):
http://www.codecodex.com/wiki/index.php?title=Derivative
but this problem really requires a GC to do it properly because you must be able to handle shared subexpressions.
#include <iostream>
using namespace std;
class Var;
class Base { public: virtual ~Base() {}; virtual const Base *clone() = 0; virtual const Base *d(const string &v) const = 0; virtual ostream &print(ostream &o) const = 0; };
ostream &operator<<(ostream &o, const Base &e) { e.print(o); return o; }
class Int : public Base { const int n; public: Int(int m) : n(m) {} ~Int() {} const Base *clone() { return new Int(n); } const Base *d(const string &v) const { return new Int(0); } ostream &print(ostream &o) const { return o << n; } };
class Var : public Base { const string var; public: Var(string v) : var(v) {} ~Var() {} const Base *clone() { return new Var(var); } const Base *d(const string &v) const { return new Int(var == v ? 1 : 0); } ostream &print(ostream &o) const { return o << var; } };
class Plus : public Base { const Base *e1, *e2; public: Plus(const Base *a, const Base *b) : e1(a), e2(b) {} ~Plus() { delete e1; delete e2; } const Base *clone() { return new Plus(e1, e2); } const Base *d(const string &v) const { return new Plus(e1->d(v), e2->d(v)); } ostream &print(ostream &o) const { return o << "(" << *e1 << " + " << *e2 << ")"; } };
class Times : public Base { const Base *e1, *e2; public: Times(const Base *a, const Base *b) : e1(a), e2(b) {} ~Times() { delete e1; delete e2; } const Base *clone() { return new Times(e1, e2); } const Base *d(const string &v) const { return new Plus(new Times(e1, e2->d(v)), new Times(e1->d(v), e2)); } ostream &print(ostream &o) const { return o << "(" << *e1 << " * " << *e2 << ")"; } };
class Expr { public: Base *e; Expr(Base *a) : e(a) {} };
const Expr operator+(const Expr e1, const Expr e2) { return Expr(new Plus(e1.e->clone(), e2.e->clone())); } const Expr operator*(const Expr e1, const Expr e2) { return Expr(new Times(e1.e->clone(), e2.e->clone())); }
ostream &operator<<(ostream &o, const Expr e) { return o << e.e; }
int main() { Var vx("x"), va("a"), vb("b"), vc("c"); Expr x(&vx), a(&va), b(&vb), c(&vc); Expr e = a*x*x + b*x + c; cout << "d(" << *(e.e) << ", x) = " << *(e.e->d("x")) << endl; return 0; }
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Roger Lindsjö - 11 Dec 2007 12:36 GMT > Following recent discussions here about benchmarking and when Java's > performance is worst, I proposed the idea of symbolic computation because [quoted text clipped - 6 lines] > So, perhaps someone would like to translate this simple symbolic simplifier > into Java: I'll post the code tonight. I have not tried to reduce the line count.
My program first prints the expression, then run 10 loops with 10000000 iterations and prints the duration for each loop. At the end it prints the final reduction. The machine is a 2.4GHz Core 2 Quad.
[* x [+ [+ [* 12 0] [+ 23 8]] y]] 1355 1281 1275 1273 1277 1284 1279 1279 1272 1274 [* x [+ 31 y]]
I'm used to Ocaml, but it looked like Haskell, so I think my simplify is correct.
> The program simply traverses a symbolic expression, reconstructing it whilst > applying some simple rewrite rules to simplify it. For example, "(a+0)*1*b" [quoted text clipped - 3 lines] > Note that the symbolic expression is immutable: you must not destroy your > input! Also, numbers are all arbitrary precision integers. The last rule, that it should handle arbitrary precision integers made me use BigIntegers, with int the program ran about twice as fast.
//Roger Lindsjö
Roger Lindsjö - 11 Dec 2007 19:52 GMT > I'm used to Ocaml, but it looked like Haskell, so I think my simplify is > correct. Of course I meant to say "I'm not used to Ocaml..."
//Roger Lindsjö
Jon Harrop - 11 Dec 2007 20:18 GMT >> I'm used to Ocaml, but it looked like Haskell, so I think my simplify is >> correct. > > Of course I meant to say "I'm not used to Ocaml..." Can we have the Java code? :-)
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Roger Lindsjö - 11 Dec 2007 23:08 GMT Before being too enthusiastic, I did not have the associative rules implemented, now my times are about 1.6 seconds for 10000000 iterations.
Also, while testing I notice that with the rules (at least the way I implemented them) [+ [+ x 3] [+ 1 y]] would get simplified to [+ [+ [+ x 3] 1] y]. Jon, it that true for your implementation also? (No need posting code if it does not fully work).
Jon, feel free to email me, just use my first name instead of news.nospam.
//Roger Lindsjö
Jon Harrop - 12 Dec 2007 13:16 GMT > Before being too enthusiastic, I did not have the associative rules > implemented, now my times are about 1.6 seconds for 10000000 iterations. [quoted text clipped - 3 lines] > [+ [+ [+ x 3] 1] y]. Jon, it that true for your implementation also? (No > need posting code if it does not fully work). That is the same, yes.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Roger Lindsjö - 12 Dec 2007 20:49 GMT I have been talking to Don off line and I used Chris Dollin's idea of using cached values for 0 and 1. (Chris and my code were actually quite similar from start, and on par for speed, but after using Cris ideo of cached 0 and 1 my end up 10% faster or so). I have changed some names to look more like Chris code to make them easier to compare. Initially I did not use instanceof but instead had a type for each element which I could use to check the type. This was actually slower than instanceof (surprised me).
Using BigInteger or BigDecimal made no difference in my test, so I used BigDecimal.
The machine I used was a 2.4GHz Core 2 Quad with 2GB memory, and while running any of the programs they seemed to use 100% of one core.
java Simplify 10000000 3 [* x [+ [+ [* 12 0] [+ 23 8]] y]] Bench 0 Took 1.16 seconds Bench 1 Took 1.11 seconds Bench 2 Took 1.104 seconds [* x [+ 31 y]]
However, after installing Ocaml I compiled Dan's program and ran. The output was (I modified it to run several times): time ./simplify Took 1.514769s Took 1.515770s Took 1.518769s
So now I get confused, according to Don this code should be about twice as fast as the Java code, but in my tests Java is slightly faster. Why is that? Perhaps I have the wrong compiler, mine says 3.09.3. There seemed to be a 3.10 out, but is wasn't part of Fedora 7 regular repository. Could upgrading to 3.10 actually make the program at least twice as fast?
Can anyone explain? Could it be that on my architecture the HotSpot is better at optimizing than the Ocaml compiler?
I then modified my program to use long instead of BigDecimal and got the following: java SimplifyLong 10000000 3 [* x [+ [+ [* 12 0] [+ 23 8]] y]] Bench 0 Took 0.61 seconds Bench 1 Took 0.549 seconds Bench 2 Took 0.558 seconds [* x [+ 31 y]]
Simplify.java import java.math.BigDecimal;
public class Simplify { public static void main(String[] args) { int lim = Integer.parseInt(args[0]); int nbench = Integer.parseInt(args[1]);
Expr expr = mul(var("x"), add( add(mul(rat(12),rat(0)), add(rat(23),rat(8))), var("y") ) );
System.out.println(expr); Expr reduced = null; for (int batch = 0; batch < nbench; batch++) { long start = System.currentTimeMillis(); System.err.println("Bench " + batch); for (int i = 0; i < lim; i++) { reduced = expr.simplify(); } System.err.println("Took " + (System.currentTimeMillis() - start) / 1000.0 + " seconds"); } System.out.println(reduced); }
private static Expr mul(Expr a, Expr b) { return new Mul(a,b); } private static Expr add(Expr a, Expr b) { return new Add(a,b); } private static Expr rat(long a) { return Val.create(BigDecimal.valueOf(a)); } private static Expr var(String s) { return new Sym(s); }
public static abstract class Expr { public abstract Expr simplify(); }
public static final class Add extends Expr { private final Expr l, r;
public Add(Expr l, Expr r) { this.l = l; this.r = r; }
public Expr simplify() { return add(l.simplify(), r.simplify()); }
private Expr add(Expr l, Expr r) { if (Val.ZERO == l) { return r; } else if (Val.ZERO == r) { return l; } else if (l instanceof Val && r instanceof Val) { return add((Val) l, (Val) r) ; } else if (r instanceof Add) { return add(l, (Add) r); } else { return new Add(l, r); } }
private Expr add(Val l, Val r) { return Val.create(l.getValue().add(r.getValue())); }
private Expr add(Expr l, Add r) { return new Add(new Add(l, r.l), r.r).simplify(); }
public String toString() { return "[+ " + l + " " + r +"]"; } }
public static class Mul extends Expr { private final Expr l, r; public Mul(Expr l, Expr r) { this.l = l; this.r = r; }
public Expr simplify() { return mul(l.simplify(), r.simplify()); }
private Expr mul(Expr l, Expr r) { if (Val.ZERO == l || Val.ZERO == r) { return Val.ZERO; } else if (Val.ONE == l) { return r; } else if (Val.ONE == r) { return l; } else if (l instanceof Val && r instanceof Val) { return mul((Val) l, (Val) r) ; } else if (r instanceof Mul) { return mul(l, (Mul) r); } else { return new Mul(l, r); } }
private Expr mul(Val l, Val r) { return Val.create(l.getValue().multiply(r.getValue())); }
private Expr mul(Expr l, Mul r) { return new Mul(new Mul(l, r.l), r.r).simplify(); }
public String toString() { return "[* " + l + " " + r +"]"; } }
public static class Sym extends Expr { private final String symbol;
public Sym(String symbol) { this.symbol = symbol; }
public Expr simplify() { return this; }
public String toString() { return symbol; } }
public static class Val extends Expr { public static final Val ZERO = new Val(BigDecimal.ZERO); public static final Val ONE = new Val(BigDecimal.ONE);
private final BigDecimal value;
private Val(BigDecimal value) { this.value = value; }
public Expr simplify() { return this; }
public static Val create(BigDecimal value) { if (BigDecimal.ZERO == value || BigDecimal.ZERO.equals(value)) { return ZERO; } else if (BigDecimal.ONE == value && BigDecimal.ONE.equals(value)) { return ONE; } else { return new Val(value); } }
public BigDecimal getValue() { return value; }
public String toString() { return String.valueOf(value); } } }
//Roger Lindsjö
Roger Lindsjö - 13 Dec 2007 15:25 GMT I have done some more test, and used the same Ocaml compiler (3.09.3) and the same Java (1.6 update 3). The times posted are averages over a few iterations. The processes were locked to one cpu to not allow the gc running on a separate cpu, but this didn't make any noticeable difference. Both machines runs late (but not the same) GNU/Linux kernels (Fedora Core and Fedora systems).
Machine 1 is a 3.2GHz Pentium with 2 cores (not hyperthreading). A 64 bit machine (were they called Pentium D?). 2GB memory. Java: 5.123 s Ocaml: 1.142 s
Machine 2 is a 2.4GHz Core 2 Quad also with 2GB memory. Java: 1.074 s Ocaml: 1.515 s
The compilers must be very different in how they utilize the cpu architecture. Sitting on machine 1 (not my primary computer) I'd say Ocaml is faster, but sitting on machine 2 (my primary machine ;-) ) I'd say Java is faster.
Perhaps Java is better at utilizing the large cache on machine 2? I don't know.
//Roger Lindsjö
Steve Wampler - 13 Dec 2007 15:55 GMT > The compilers must be very different in how they utilize the cpu > architecture. Sitting on machine 1 (not my primary computer) I'd say > Ocaml is faster, but sitting on machine 2 (my primary machine ;-) ) I'd > say Java is faster. Thanks for testing this - I've been trying to figure out why Jon and I have been getting such different results. This may well explain it.
 Signature Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
Lew - 13 Dec 2007 17:03 GMT >> The compilers must be very different in how they utilize the cpu >> architecture. Sitting on machine 1 (not my primary computer) I'd say [quoted text clipped - 3 lines] > Thanks for testing this - I've been trying to figure out why Jon and I > have been getting such different results. This may well explain it. This supports the points of those who say, "Watch out for benchmarks," and, "Java can be as fast as C++", and. "Java is slower that C++", and, "It depends".
 Signature Lew
bugbear - 14 Dec 2007 13:58 GMT >>> The compilers must be very different in how they utilize the cpu >>> architecture. Sitting on machine 1 (not my primary computer) I'd say [quoted text clipped - 7 lines] > and, "Java can be as fast as C++", and. "Java is slower that C++", and, > "It depends". May I point out the existence of comp.benchmarks ?
BugBear
Roedy Green - 13 Dec 2007 21:09 GMT On Thu, 13 Dec 2007 16:25:29 +0100, Roger Lindsjö <news.nospam@tilialacus.net> wrote, quoted or indirectly quoted someone who said :
>Machine 2 is a 2.4GHz Core 2 Quad also with 2GB memory. >Java: 1.074 s >Ocaml: 1.515 s If you pass me the Java code, I can run it both with java.exe and with Jet. That ratio would give you another datapoint, compiled Java.
I have a dual core Athlon. I think there are no threads in your implementation (are there in the Ocaml?), you would expect dual core to buy you nothing.
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Steve Wampler - 13 Dec 2007 21:46 GMT > I have a dual core Athlon. I think there are no threads in your > implementation (are there in the Ocaml?), you would expect dual core > to buy you nothing. I don't know how Jet handles the execution environment, but Sun's JDK runs lots of administrative threads while a program is running. I would expect some benefit from dual cores with that, but apparently (from Roger's note) it's not much.
 Signature Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
Roger Lindsjö - 13 Dec 2007 22:16 GMT > On Thu, 13 Dec 2007 16:25:29 +0100, Roger Lindsjö > <news.nospam@tilialacus.net> wrote, quoted or indirectly quoted [quoted text clipped - 10 lines] > implementation (are there in the Ocaml?), you would expect dual core > to buy you nothing. No threading other than what the JVM does internally for housekeeping such as GC. I even ran restricted to one core (using taskset) since I was told that Ocaml was single threaded.
The code is 3 posts up, the one I was relying to. http://groups.google.com/group/comp.lang.java.programmer/msg/b5185f597ba62103
//Roger Lindsjö
Lew - 14 Dec 2007 01:32 GMT > No threading other than what the JVM does internally for housekeeping > such as GC. I even ran restricted to one core (using taskset) since I > was told that Ocaml was single threaded. If it's true that Ocaml is single-threaded, then that kills it for a wide range of applications for which Java is extremely suitable. Performance would be moot, even if the numbers weren't so close.
 Signature Lew
Daniel Dyer - 14 Dec 2007 02:10 GMT >> No threading other than what the JVM does internally for housekeeping >> such as GC. I even ran restricted to one core (using taskset) since I [quoted text clipped - 3 lines] > wide range of applications for which Java is extremely suitable. > Performance would be moot, even if the numbers weren't so close. http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73 338fb15.en.html
I briefly looked at OCaml because I'd heard a lot of good things about. The threading thing did put me off. It seems there are issues with its garbage collector implementation in this respect.
Dan.
 Signature Daniel Dyer http://www.uncommons.org
Jon Harrop - 14 Dec 2007 09:51 GMT >> No threading other than what the JVM does internally for housekeeping >> such as GC. I even ran restricted to one core (using taskset) since I [quoted text clipped - 3 lines] > range of applications for which Java is extremely suitable. Performance > would be moot, even if the numbers weren't so close. OCaml simply uses processes for concurrency rather than threads. In terms of capabilities, there is no difference.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Lew - 14 Dec 2007 13:46 GMT >>> No threading other than what the JVM does internally for housekeeping >>> such as GC. I even ran restricted to one core (using taskset) since I [quoted text clipped - 5 lines] > OCaml simply uses processes for concurrency rather than threads. In terms of > capabilities, there is no difference. I'm not familiar with OCaml. Does it have a library of concurrency constructs, such as blocking / non-blocking queues, monitors or other locks, memory-model coordinating constructs cognate to Java's "volatile" and the like to coordinate shared data?
 Signature Lew
Jon Harrop - 14 Dec 2007 14:50 GMT > I'm not familiar with OCaml. Does it have a library of concurrency > constructs, such as blocking / non-blocking queues, monitors or other > locks, memory-model coordinating constructs cognate to Java's "volatile" > and the like to coordinate shared data? No, OCaml has data parallel and message passing libraries instead. These are higher level constructs for parallelism and concurrency that abstract away the need to use the low-level constructs like monitors and locks directly. This is easier to use and arguably safer (although I have no evidence to back up the latter in general, outside of Erlang's overwhelming victory for massively-concurrent applications).
There are complicated trade-offs involved here but, essentially, OCaml sacrifices high-performance inter-thread communication (that you can achieve with) for better scaling because its GC requires no garbage collection.
There is also a new research language called JoCaml that integrates support for concurrency directly into the OCaml language using the join calculus.
You may also be interested in related efforts such as the recent support for hinted parallelism in Haskell and F#'s recently introduction asynchronous workflows.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Chris Dollin - 14 Dec 2007 15:33 GMT > There are complicated trade-offs involved here but, essentially, OCaml > sacrifices high-performance inter-thread communication (that you can > achieve with) for better scaling because its GC requires no garbage > collection. Did something go wrong with that paragraph, possibly just over-ambitions packing? I can't make consistent sense of it.
 Signature Chris "perhaps it's the Friday effect" Dollin
Hewlett-Packard Limited registered no: registered office: Cain Road, Bracknell, Berks RG12 1HN 690597 England
Jon Harrop - 14 Dec 2007 18:57 GMT >> There are complicated trade-offs involved here but, essentially, OCaml >> sacrifices high-performance inter-thread communication (that you can [quoted text clipped - 3 lines] > Did something go wrong with that paragraph, possibly just over-ambitions > packing? I can't make consistent sense of it. That was indeed a buggy paragraph. :-)
Multithreaded GC (like the JVM and .NET) provides cheap interthread communication. OCaml's approach (independent GCs per process) can make interthread communication much more expensive but it scales better.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Mark Thornton - 14 Dec 2007 20:02 GMT >>> There are complicated trade-offs involved here but, essentially, OCaml >>> sacrifices high-performance inter-thread communication (that you can [quoted text clipped - 8 lines] > communication. OCaml's approach (independent GCs per process) can make > interthread communication much more expensive but it scales better. With multicore (shared memory) processors now the norm on the desk top, the advantage of the Java approach is significant for some desktop applications. The work I'm doing at the moment has threads sharing data structures which run to hundreds of megabytes. Combined with the Doug Lea's Fork/Join library (jsr166y) this is giving me good speed up over sequential computation. Separate processes not only impose higher communication overhead, but on some architectures, the overhead of process creation is substantial. Much as I would like to wean customers off Windows, my code has to work efficiently on that platform.
Mark Thornton
Jon Harrop - 14 Dec 2007 21:53 GMT > With multicore (shared memory) processors now the norm on the desk top, > the advantage of the Java approach is significant for some desktop > applications. I'm interested in seeing evidence either way. From the benchmarks in this thread, for example, I attribute the (up to) 5x slower allocation rates in Java to the fact that its allocator and GC must cope with interthread issues.
> The work I'm doing at the moment has threads sharing data > structures which run to hundreds of megabytes. Combined with the Doug > Lea's Fork/Join library (jsr166y) this is giving me good speed up over > sequential computation. Sure. The work I'm doing at the moment has many processes with large shared data structures.
> Separate processes not only impose higher communication overhead, but on > some architectures, the overhead of process creation is substantial. > Much as I would like to wean customers off Windows, my code has to work > efficiently on that platform. Yes. Process creation in OCaml takes about 10ms here whereas thread creation in F# takes 1ms.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Mark Thornton - 14 Dec 2007 23:29 GMT >> With multicore (shared memory) processors now the norm on the desk top, >> the advantage of the Java approach is significant for some desktop [quoted text clipped - 4 lines] > Java to the fact that its allocator and GC must cope with interthread > issues. The allocator is very trivial, almost all the time is in collection. The comment seen elsewhere that dead objects are free to collect is not quite true. If you have very high rates of garbage creation, the nursery area will fill frequently and cause garbage collection to occur often. There seems to be at least some fixed cost in gc, so if the live object count/size is very small, the time for each gc cycle does not tend to zero. Many cases of high rates of allocating short lived objects have a recognisable block structure. The current JVM can detect this (escape analysis) but so far only uses this to eliminate some types of locks. Perhaps unfortunately a lot of production Java code has been hand 'optimised' to reduce such allocations, which means that when the compiler people test escape analysis, the returns aren't as good as hoped (programmers have already done it by hand). This technique is apparently used in the (not free) JET jvm, and we hope it will finally appear in Java 7.
As for multithread issues, the default garbage collector (note Sun's jvm comes with a number of collectors which you can select using runtime options), is a stop the world variety. As all threads are stopped (apart from the collector itself) interthread issues are non existent. In threaded applications this means that neither allocation or collection have threading overheads (no synchronization in either case). OK, it must take some time to suspend/resume the working threads. Compared to the costs involved with thread safe heaps in C/C++ this can be a significant gain. Note that the JVM can use simplified code when running on a uniprocessor, i.e. multiprocessor costs only occur if you actually run the application on such a machine.
> Sure. The work I'm doing at the moment has many processes with large > shared data structures. It is awkward to include pointers in the shared structures unless you can ensure that the structure is mapped to the same address in each process. Easy to arrange in a unix fork environment, more difficult to ensure on Windows, especially if the processes are not effectively clones.
Incidentally your ray trace code makes unnecessary use of inner classes as opposed to nested classes. Sprinkling a few 'static' keywords around chops about 10% off the time of the simple version. Another 10% can be saved by specifying a larger heap than default (on my machine the default initial heap is about 5MB). These savings apply to both client and server JVM's.
Mark Thornton
Jon Harrop - 14 Dec 2007 23:49 GMT > As for multithread issues, the default garbage collector (note Sun's jvm > comes with a number of collectors which you can select using runtime > options), is a stop the world variety. As all threads are stopped (apart > from the collector itself) interthread issues are non existent... Stop-the-world is exactly the kind of interthread issue that I was referring to.
If you take this symbolic simplifier, for example, and make it run two simplifications at a time on my dual core then its performance will not improve by a factor of two if the GC stops all threads while it runs.
In contrast, the OCaml is not only more than twice as fast to start with but it will scale linearly with the number of cores in this case because the GCs do not interfere.
> > Sure. The work I'm doing at the moment has many processes with large > > shared data structures. > > It is awkward to include pointers in the shared structures unless you > can ensure that the structure is mapped to the same address in each > process. There are no pointers in these languages though.
> Incidentally your ray trace code makes unnecessary use of inner classes > as opposed to nested classes. Sprinkling a few 'static' keywords around > chops about 10% off the time of the simple version. Another 10% can be > saved by specifying a larger heap than default (on my machine the > default initial heap is about 5MB). These savings apply to both client > and server JVM's. Can you send me your code and I'll update it.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Mark Thornton - 15 Dec 2007 09:07 GMT >> As for multithread issues, the default garbage collector (note Sun's jvm >> comes with a number of collectors which you can select using runtime [quoted text clipped - 11 lines] > it will scale linearly with the number of cores in this case because the > GCs do not interfere. Note that Sun's Java 6 also comes with a parallel garbage collector. I didn't test that because my home machine is an old single core processor. I notice from other comments that the performance of OCaml relative to Java seems to depend quite a bit on the specific environment. Some people are reporting Java as faster.
> Can you send me your code and I'll update it. Done.
Mark Thornton
Lew - 15 Dec 2007 01:09 GMT >> I'm not familiar with OCaml. Does it have a library of concurrency >> constructs, such as blocking / non-blocking queues, monitors or other [quoted text clipped - 7 lines] > back up the latter in general, outside of Erlang's overwhelming victory for > massively-concurrent applications). Message passing as a primitive coordination and concurrency mechanism is extremely powerful, especially in an O.S. like QNX that reduces context-switch times to ridiculously small values. I first encountered it on a minicomputer in 1981. It makes for very straightforward coding, too.
 Signature Lew
Jon Harrop - 15 Dec 2007 18:45 GMT >> No, OCaml has data parallel and message passing libraries instead. These >> are higher level constructs for parallelism and concurrency that abstract [quoted text clipped - 9 lines] > minicomputer > in 1981. It makes for very straightforward coding, too. Exactly, yes. However, OCaml does not do such optimizations (AFAIK). To be fair, I haven't tried it though.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
ldv@mail.com - 14 Dec 2007 10:27 GMT > > On Thu, 13 Dec 2007 16:25:29 +0100, Roger Lindsjö > > <news.nos...@tilialacus.net> wrote, quoted or indirectly quoted [quoted text clipped - 18 lines] > > //Roger Lindsjö On my 2GHz Celeron the executable produced by the latest Excelsior JET 6.0 is much faster than HotSpot Client VM 1.6.0_3, but slower than the Server VM, which is almost twice as fast as the Client VM. Here go the best results selected from 10 consecutive runs:
HSS : 2.594s JET : 3.25s HSC : 5.172s
On a colleague's Core 2 Duo, the situation is more or less the same:
HSS : 1.206s JET : 1.687s HSC : 2.118s
Are you guys all using the Server VM in your comparisons?
LDV
Roger Lindsjö - 14 Dec 2007 10:46 GMT >>> On Thu, 13 Dec 2007 16:25:29 +0100, Roger Lindsjö >>> <news.nos...@tilialacus.net> wrote, quoted or indirectly quoted [quoted text clipped - 31 lines] > > Are you guys all using the Server VM in your comparisons? I'm using Server VM by default (or actually, the JVM does), the output from just java contains: The default VM is server, because you are running on a server-class machine.
//Roger Lindsjö
Jon Harrop - 14 Dec 2007 10:46 GMT > I'm using Server VM by default (or actually, the JVM does), the output > from just java contains: > The default VM is server, > because you are running on a server-class machine. Me too.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Roedy Green - 14 Dec 2007 22:11 GMT On Thu, 13 Dec 2007 23:16:57 +0100, Roger Lindsjö <news.nospam@tilialacus.net> wrote, quoted or indirectly quoted someone who said :
>>> Machine 2 is a 2.4GHz Core 2 Quad also with 2GB memory. >>> Java: 1.074 s >>> Ocaml: 1.515 s To my surprise Java -server skunked Jet which just edges out java -client. I gather the run-time knowledge -server can gather is paying off big time here.
Athlon 64 X2 3800+ dual core, 2 gig ram.
java -client Simplify 10000000 3 [* x [+ [+ [* 12 0] [+ 23 8]] y]] Bench 0 Took 4.422 seconds Bench 1 Took 4.728 seconds Bench 2 Took 4.408 seconds [* x [+ 31 y]]
using jet Simplify.exe 10000000 3 [E:\exper]Simplify 10000000 3 [* x [+ [+ [* 12 0] [+ 23 8]] y]] Bench 0 Took 4.178 seconds Bench 1 Took 4.193 seconds Bench 2 Took 4.303 seconds [* x [+ 31 y]]
java -server Simplify 10000000 3 [* x [+ [+ [* 12 0] [+ 23 8]] y]] Bench 0 Took 2.91 seconds Bench 1 Took 2.84 seconds Bench 2 Took 2.84 seconds [* x [+ 31 y]]
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Jon Harrop - 15 Dec 2007 00:02 GMT > On Thu, 13 Dec 2007 23:16:57 +0100, Roger Lindsjö > <news.nospam@tilialacus.net> wrote, quoted or indirectly quoted [quoted text clipped - 19 lines] > Took 2.84 seconds > [* x [+ 31 y]] Athlon 64 X2 4400+ dual core, 2Gb RAM
OCaml: 0.695s
Even accounting for CPU speed differences, OCaml is still running 3.5x faster, primarily because it can allocate much more quickly than Java.
This is also neglecting the facts that Java is using 100x as much RAM and doubles the load average (i.e. maxes out both of my cores whereas the OCaml only uses one).
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Chris Dollin - 12 Dec 2007 13:58 GMT > Following recent discussions here about benchmarking and when Java's > performance is worst, I proposed the idea of symbolic computation because [quoted text clipped - 8 lines] > > http://www.lambdassociates.org/studies/study10.htm [[CAVEAT: hacked together over a lunchtime or so, in no way any official statement of anything from anyone, etc.]]
OK, my broken-handed implementation running on a 3.06GHz Xeon with 2GB memory (not that that matters much, since the JVM is only using the default which is more like 64M) using FC5 with
Java(TM) SE Runtime Environment (build 1.6.0-b105) Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)
and running (of course) with -server takes 2.498s, 2.94s, 3.002s, for the 10000000-times loop (ie neglecting start-up times, which according to `time` are about 0.2s). I don't know why there's so much variability in the reported times.
Drat. OK, those times are using BigDecimal, not BigInteger, which gets me 3.258s, 3.281s, 3.207s as the loop times. (Using plain `int` gets those times down to 1.986s, 1.62s, 1.987s, as a measure of the work done by the big-integer arithmetics.)
-server about halves the runtime. My initial -server-free code took about 9s [with BigDecimal], but some straightforward optimisation work (none of which comes close to the obvious caching cheat; just reordering the tests and ensuring that the important values 0 and 1 were testable with ==) reduced that.
I tried the obvious tactic of going for double-dispatches to avoid multiple consecutive instanceof tests, but that made it /slower/, perhaps because hotspot couldn't collapse code across method boundaries? I don't really mind because in this case the multiple dispatches made the code rather more long-winded than it already was.
Count the line-size yourselves and subtract lines you deem not relevant to the benchmark.
;;; -- code starteth here ----------------------------------
package temp;
import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertSame;
import java.math.BigDecimal; import java.util.StringTokenizer;
import org.junit.Test;
/* let rec ( +: ) f g = match f, g with | `Int n, `Int m -> `Int (n +/ m) | `Int (Int 0), e | e, `Int (Int 0) -> e | f, `Add(g, h) -> f +: g +: h | f, g -> `Add(f, g)
let rec ( *: ) f g = match f, g with | `Int n, `Int m -> `Int (n * / m) | `Int (Int 0), e | e, `Int (Int 0) -> `Int (Int 0) | `Int (Int 1), e | e, `Int (Int 1) -> e | f, `Mul(g, h) -> f *: g *: h | f, g -> `Mul(f, g)
let rec simplify = function | `Int _ | `Var _ as f -> f | `Add (f, g) -> simplify f +: simplify g | `Mul (f, g) -> simplify f * : simplify g
*/ public class SimplifyViaInstanceOf { @Test public void ensureParseCompleteConstants() { assertSame( zero, parse( "0" ) ); assertSame( one, parse( "1" ) ); assertEquals( new Val( new BigDecimal( "17" ) ), parse( "17" ) ); assertEquals( new Val( new BigDecimal( "1234567890" ) ), parse( "1234567890" ) ); } @Test public void ensureParseCompleteNames() { assertEquals( new Sym( "x" ), parse( "x" ) ); assertEquals( new Sym( "y" ), parse( "y" ) ); assertEquals( new Sym( "calico" ), parse( "calico" ) ); assertEquals( new Sym( "whoot" ), parse( "whoot" ) ); } @Test public void ensureParseSimpleExpression() { assertEquals( new Add( new Sym( "x" ), new Sym( "y" ) ), parse( "(x+y)" ) ); assertEquals( new Add( one, new Sym( "y" ) ), parse( "(1+y)" ) ); assertEquals( new Add( zero, new Sym( "y" ) ), parse( "(0+y)" ) ); assertEquals( new Add( new Sym( "x" ), zero ), parse( "(x+0)" ) ); assertEquals( new Mul( new Sym( "x" ), new Sym( "y" ) ), parse( "(x*y)" ) ); assertEquals( new Mul( new Sym( "z" ), zero ), parse( "(z*0)" ) ); assertEquals( new Mul( zero, new Sym( "x" ) ), parse( "(0*x)" ) ); } @Test public void ensureParseCompoundExpression() { assertEquals( new Mul( parse( "(x+y)" ), one ), parse( "((x+y)*1)" ) ); assertEquals( new Mul( parse( "(x+y)" ), zero ), parse( "((x+y)*0)" ) ); assertEquals( new Mul( parse( "(x+y)" ), new Sym( "x" ) ), parse( "((x+y)*x)" ) ); assertEquals( new Mul( parse( "(x+y)"), parse( "(0*1)" ) ), parse( "((x+y)*(0*1))" ) ); } private Exp parse( String string ) { StringTokenizer st = new StringTokenizer( string, "+*()", true ); return parse( st, st.nextToken() ); }
private Exp parse( StringTokenizer st, String current ) { char ch = current.charAt(0); if (ch == '(') { Exp L = parse( st, st.nextToken() ); ch = st.nextToken().charAt( 0 ); Exp R = parse( st, st.nextToken() ); String ket = st.nextToken(); if (!ket.equals( ")" )) throw new RuntimeException( "ket needed: " + ket ); return ch == '*' ? new Mul( L, R ) : new Add( L, R ); } else if (Character.isDigit( ch )) return Val.create( new BigDecimal( current ) ); else if (current.equals( "*" ) || current.equals( "+" ) || current.equals( " )" )) throw new RuntimeException( "misplaced punctuation: " + current ); else return new Sym( current ); }
@Test public void ensureSimplifiesAddZero() { assertEquals( parse( "x" ), parse( "(x+0)" ).simplify() ); assertEquals( parse( "x" ), parse( "(0+x)" ).simplify() ); } @Test public void ensureAddsLiterals() { assertEquals( parse( "3" ), parse( "(1+2)" ).simplify() ); assertSame( zero, parse( "0+0" ).simplify() ); } @Test public void ensureMultipliesLiterals() { assertEquals( parse( "15" ), parse( "(5*3)" ).simplify() ); assertSame( zero, parse( "(0*0)" ).simplify() ); assertSame( zero, parse( "(x*0)" ).simplify() ); assertSame( zero, parse( "(0*y)" ).simplify() ); } @Test public void ensureSimplifiesMulZero() { assertSame( zero, parse( "(x*0)" ).simplify() ); assertSame( zero, parse( "(0*x)" ).simplify() ); } @Test public void ensureSimplifiesMulOne() { assertEquals( parse( "x" ), parse( "(x*1)" ).simplify() ); assertEquals( parse( "x" ), parse( "(1*x)" ).simplify() ); } @Test public void ensureLeftAssociatesAdds() { assertEquals( parse( "((x+y)+z)" ), parse( "(x+(y+z))" ).simplify() ); } @Test public void ensureLeftAssociatesMuls() { assertEquals( parse( "((x*y)*z)" ), parse( "(x*(y*z))" ).simplify() ); } static abstract class Exp { abstract Exp simplify(); } static final Val zero = new Val( BigDecimal.ZERO ); static final Val one = new Val( BigDecimal.ONE ); static class Val extends Exp { final BigDecimal val; Val( BigDecimal val ) { this.val = val; } Exp simplify() { return this; } public String toString() { return val.toString(); } public boolean equals( Object other ) { return other instanceof Val && val.equals( ((Val) other).val ); } public static Exp create( BigDecimal d ) { return d.equals( BigDecimal.ZERO ) ? zero : d.equals( BigDecimal.ONE ) ? one : new Val( d ); } } static class Sym extends Exp { String name; Sym( String name ) { this.name = name; } Exp simplify() { return this; } public boolean equals( Object other ) { return other instanceof Sym && name.equals( ((Sym) other ).name ); } public String toString() { return name; } } static abstract class Inf extends Exp { final Exp L, R; Inf( Exp L, Exp R ) { this.L = L; this.R = R; } } static class Add extends Inf { Add( Exp L, Exp R ) { super( L, R ); } Exp simplify() { return sum( L.simplify(), R.simplify() ); } private Exp sum( Exp sL, Exp R ) { if (sL ==zero) return R; if (R == zero) return sL; if (sL instanceof Val && R instanceof Val) return Val.create( ((Val) sL).val.add( ((Val) R).val ) ); if (R instanceof Add) return sum( sum( sL, ((Add) R).L ), ((Add) R).R ); return new Add( sL, R ); } public String toString() { return "(" + L + " + " + R + ")"; } public boolean equals( Object other ) { return other instanceof Add && same( (Add) other ); } private boolean same( Add other ) { return L.equals( other.L ) && R.equals( other.R ); } } static class Mul extends Inf { Mul( Exp L, Exp R ) { super( L, R ); } Exp simplify(){ return prod( L.simplify(), R.simplify() ); } private Exp prod( Exp L, Exp R ) { if (L == zero || R ==zero) return zero; if (L == one) return R; if (R == one) return L; if (L instanceof Val && R instanceof Val) return Val.create( ((Val) L).val.multiply( ((Val) R).val ) ); if (R instanceof Mul) return prod( prod( L, ((Mul) R).L ), ((Mul) R).R ); return new Mul( L, R ); } public String toString() { return "(" + L + " * " + R + ")"; } public boolean equals( Object other ) { return other instanceof Mul && same( (Mul) other ); } private boolean same( Mul other ) { return L.equals( other.L ) && R.equals( other.R ); } } public static void main( String [] args ) { // [* x [+ [+ [* 12 0] [+ 23 8]] y]] Exp a28_8 = new Add( Val.create( new BigDecimal( 23 ) ), Val.create( new BigDecimal( 8 ) ) ); Exp m12_0 = new Mul( Val.create( new BigDecimal( 12 ) ), Val.create( new BigDecimal( 0 ) ) ); Exp aAB = new Add( m12_0, a28_8 ); Exp aABY = new Add( aAB, new Sym( "y" ) ); Exp it = new Mul( new Sym( "x" ), aABY ); Exp itx = new SimplifyViaInstanceOf().parse( "(x*(((12*0)+(23+8))+y))" ); System.err.println( ">> original expression: " + it ); System.err.println( ">> original expression: " + itx ); System.err.println( ">> simplified expression: " + itx.simplify() ); long base = System.currentTimeMillis(); for (int i = 0; i < 10000000; i += 1) it.simplify(); long after = System.currentTimeMillis(); System.err.println( ">> took " + (after - base)/1000.0 + "s" ); } }
 Signature Hewlett-Packard Limited registered office: Cain Road, Bracknell, registered no: 690597 England Berks RG12 1HN
Jon Harrop - 12 Dec 2007 15:44 GMT > OK, my broken-handed implementation running on a 3.06GHz Xeon with 2GB > memory (not that that matters much, since the JVM is only using the [quoted text clipped - 7 lines] > according to `time` are about 0.2s). I don't know why there's so > much variability in the reported times. Great.
> Drat. OK, those times are using BigDecimal, not BigInteger, which > gets me 3.258s, 3.281s, 3.207s as the loop times. (Using plain > `int` gets those times down to 1.986s, 1.62s, 1.987s, as a measure > of the work done by the big-integer arithmetics.) Let's forget about the arbitrary-precision stuff and focus on the rest for now. On my machine, your original code gives 2.853. Changing to machine-precision ints that drops to 1.508s. The equivalent OCaml takes only 0.711s.
> -server about halves the runtime. My initial -server-free code took > about 9s [with BigDecimal], but some straightforward optimisation work > (none of which comes close to the obvious caching cheat; just reordering > the tests and ensuring that the important values 0 and 1 were testable > with ==) reduced that. Yes, the OCaml compiler will automatically factor common subexpressions like the constant integer expressions.
> I tried the obvious tactic of going for double-dispatches to avoid > multiple consecutive instanceof tests, but that made it /slower/, > perhaps because hotspot couldn't collapse code across method > boundaries? I don't really mind because in this case the multiple > dispatches made the code rather more long-winded than it already > was. Interesting.
> Count the line-size yourselves and subtract lines you deem not > relevant to the benchmark. I get 175 LOC for Roger's Java code, 117 LOC for your Java and 40 LOC for my OCaml.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Steve Wampler - 12 Dec 2007 22:49 GMT > Following recent discussions here about benchmarking and when Java's > performance is worst, I proposed the idea of symbolic computation because [quoted text clipped - 8 lines] > > http://www.lambdassociates.org/studies/study10.htm Here's another Java one. I didn't bother with parsing an input string because it looks like people were timing just the time to walk the parse tree and simplify it.
I've been hesitant to post it because it feels like cheating (and the code isn't very OO - I actually tend to think in a different language for these puzzles and then cast the result into Java).
Here are times on my dual-Opteron@2GHz, using BigInteger, each run does 10,000,000 simplifications and then outputs the original string and the simplified version (taken from the last run):
------------------------------------------------------ time java -server Simplify 10000000 10 10000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0090 10000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 10000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 10000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 10000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 10000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 10000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 10000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 10000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0010 10000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 java -server Simplify 10000000 10 0.11s user 0.03s system 103% cpu 0.135 total ------------------------------------------------------
Why so fast? Because the simplified string is built up as the parse is constructed. So the simplification loop just outputs the result! I'll grant that this isn't in the spirit of things, but I couldn't see where it was prohibited :) After all, if you're allowed to simplify while building the parse tree, this isn't much more of a stretch...
(Naturally, this would work when parsing of input strings is done instead of hand-building the parse tree, as well.)
----------------------------------------------------- import java.math.BigInteger; import java.util.Date;
public class Simplify {
private static BigInteger zero = new BigInteger("0"); private static BigInteger one = new BigInteger("1");
private class Expr { public String type; public Expr left; public Expr right; public BigInteger iVal; public String sVal;
public Expr(int nVal) { type = "int"; iVal = new BigInteger(""+nVal); sVal = iVal.toString(); }
public Expr(String var) { type = "var"; sVal = var; }
public Expr(String nType, Expr nLeft, Expr nRight) { if (nType == "+") { if (nLeft.eq(zero)) { mkCopy(nRight); } else if (nRight.eq(zero)) { mkCopy(nLeft); } else if ((nLeft.type == "int") && (nRight.type == "int")) { type = "int"; iVal = nLeft.iVal.add(nRight.iVal); sVal = iVal.toString(); } else lassoc(nType, nLeft, nRight); } if (nType == "*") { if (nLeft.eq(one)) { mkCopy(nRight); } else if (nRight.eq(one)) { mkCopy(nLeft); } else if ((nLeft.type == "int") && (nRight.type =="int")) { type = "int"; iVal = nLeft.iVal.multiply(nRight.iVal); sVal = iVal.toString(); } else lassoc(nType, nLeft, nRight); } }
private void lassoc(String nType, Expr nLeft, Expr nRight) { if (nRight.type == nType) { type = nType; left = new Expr(nType, nLeft, nRight.left); right = nRight.right; sVal = "["+nType+" "+left+" "+right+"]"; } else mkCopy(nType, nLeft, nRight, zero, "["+nType+" "+nLeft+" "+nRight+"]"); }
private void mkCopy(Expr e) { mkCopy(e.type, e.left, e.right, e.iVal, e.sVal); }
private void mkCopy(String nType, Expr nLeft, Expr nRight, BigInteger nVal, String nsVal) { type = nType; left = nLeft; right = nRight; iVal = nVal; sVal = nsVal; }
public boolean eq(BigInteger i) { return (type == "int") && (iVal.equals(i)); }
public String toString() { return sVal; }
public String simplify() { return sVal; } }
public static String simplify(Expr expr) { return expr.simplify(); }
public void runTest(int limit) { String s = "[* x [+ [+ [* 12 0][+ 23 8]] y ]]"; System.out.print(""+limit+" '"+s+"' -> "); Expr e = new Expr("*", new Expr("x"), new Expr("+", new Expr("+", new Expr("*", new Expr(12), new Expr(0) ), new Expr("+", new Expr(23), new Expr(8) ) ) , new Expr("y") ) ); Date start = new Date(); String newS = null; for (int i = 0; i < limit; ++i) { newS = simplify(e); } Date stop = new Date(); System.out.println("'"+newS+"':: "+ (stop.getTime() - start.getTime())/1000.0); }
public static void main(String[] args) { int limit = 10000000; if (args.length > 0) limit = Integer.parseInt(args[0]); int nRuns = 10; if (args.length > 1) nRuns = Integer.parseInt(args[1]);
Simplify simplify = new Simplify(); for (int i = 0; i < nRuns; ++i) { simplify.runTest(limit); } System.exit(0); } } -----------------------------------------------------
 Signature Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
Chris Dollin - 13 Dec 2007 08:36 GMT one. I didn't bother with parsing an
> input string because it looks like people were timing > just the time to walk the parse tree and simplify it. [quoted text clipped - 3 lines] > a different language for these puzzles and then cast the > result into Java).
> Why so fast? Because the simplified string is built up as the parse > is constructed. So the simplification loop just outputs the result! > I'll grant that this isn't in the spirit of things, but I couldn't see > where it was prohibited :) After all, if you're allowed to simplify > while building the parse tree, this isn't much more of a stretch... I thought of that and decided it wasn't in the spirit of the test; it does show that without the context the problem originally came from, one doesn't know if it's a realistic optimisation or not, and it shows how algorithmic changes can beat mere [micro] optimisation. The other trick I thought of, almost but not quite a similar complete cheat, is to give each node a slot to hold its simplified form and store it the first time it gets computed. In this case, that would mean that the first pass through the loop would do all the computation and the remaining passes would be trivial ...
[I know Jon wanted immutable nodes, but these /would/ be immutable nodes; there's no external visible change to the behaviour except performance. Also, given that Java doesn't even pretend to be functional, disallowing assignments could be seen as bias ... I do think it's sensible to make the expression nodes /functionally/ immutable.]
 Signature Chris "puns is us" Dollin
Hewlett-Packard Limited Cain Road, Bracknell, registered no: registered office: Berks RG12 1HN 690597 England
Steve Wampler - 13 Dec 2007 12:23 GMT > The other > trick I thought of, almost but not quite a similar complete cheat, is > to give each node a slot to hold its simplified form and store it the > first time it gets computed. In this case, that would mean that the > first pass through the loop would do all the computation and the remaining > passes would be trivial ... Yes, that's essentially what my original (unpublished) solution (which includes the parsing of the string inside every iteration of the inner timing loop) does. I wrote it with a string transformation approach instead of thinking of it as a parse tree approach and used caching to avoid reparsing any previously parsed substring. Since every string (including the input!) is a substring of itself, this means that it is slow on the first iteration and reduces to a cache lookup on iterations 2-10,000,000 for this 'benchmark'.
So, the times, where the input string is passed to the simplifier on each iteration are roughly 0.12 ms per 10,000,000 iterations. The parsing code is really a quick and (very) dirty hack, so I decided not to put it out.
I'm still debating with myself about whether or not saving the simplified string is a cheat or not - it's just a representation of the parse tree *at that point*, so if you're allowed to simplify while building the parse tree, why not save its representation as well? After all, without it, the challenge really reduces to how fast you can recurse, concatenate strings, and convert big integers to strings - not really do symbolic math manipulations.
Incidently, I noticed I had written
public Expr(int i) {...
where it should be:
public Expr(BigInteger i)
instead. (And the hand-built parse tree would have to use "new BigInteger(...)" instead of int literals). Doesn't affect the timings enough to notice.
Also, the code could be cleaned up...and made more OO. Then it would be pretty much the same as your solution, of course.
 Signature Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
Chris Dollin - 13 Dec 2007 13:34 GMT (stuff about caching, and at the end of Steve's response he wrote:)
> Also, the code could be cleaned up...and made more OO. Then it would be > pretty much the same as your solution, of course. Could easily be better. I don't like using `instanceof` -- it doesn't seem to me to be in the spirit of an OO language. Type decisions should be made by polymorphism. But, as I said, my attempt at using polymorphism via multiple-dispatch was slower than my instanceof code. Perhaps I gave up too early ...
 Signature Chris "giving in until he gets his own way" Dollin
Hewlett-Packard Limited registered office: Cain Road, Bracknell, registered no: 690597 England Berks RG12 1HN
Jon Harrop - 13 Dec 2007 20:42 GMT >> Also, the code could be cleaned up...and made more OO. Then it would be >> pretty much the same as your solution, of course. [quoted text clipped - 4 lines] > via multiple-dispatch was slower than my instanceof code. Perhaps I gave > up too early ... Yes, this is exactly where OOP ceases to be useful. I believe the idiomatic OOP solution is to implement lots of different member functions reflecting pieces of pattern match. For example, you would have an add_int member function to add an int and specialize the add_int member of the Int class to create a new Int.
I might have a go but I've no idea what would be efficient in Java.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Jon Harrop - 13 Dec 2007 20:36 GMT > [I know Jon wanted immutable nodes... I don't mind mutable nodes but I don't want simplification to destroy its input expression.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Jon Harrop - 13 Dec 2007 20:39 GMT > Why so fast? Because the simplified string is built up as the parse > is constructed. So the simplification loop just outputs the result! > I'll grant that this isn't in the spirit of things, but I couldn't see > where it was prohibited :) After all, if you're allowed to simplify > while building the parse tree, this isn't much more of a stretch... To be honest, I'm not really sure how this relates to the original challenge or to my OCaml. Specifically, I haven't used any strings or done any parsing.
The previous Java implementations weren't benchmarking parsing, were they?
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Steve Wampler - 13 Dec 2007 21:43 GMT >> Why so fast? Because the simplified string is built up as the parse >> is constructed. So the simplification loop just outputs the result! [quoted text clipped - 7 lines] > > The previous Java implementations weren't benchmarking parsing, were they? That's correct. I had originally assumed that the program should work for different expressions and not have expression trees hard-coded. So I took that to mean that some form of string input was needed so that expressions could be parsed into the expression tree. That led me to a string transformation solution as a first try. When I saw the C++ solution to the derivative solution you offered as an example of something similar, I realized that there was no need to read a string and parse it. But my earlier misinterpretation keeps me coming back to referring to the hard-coded internal representation as a parse tree (that, and my previous life writing compilers).
Since the simplification can be done when that hard-coded parse tree is built, the timing loop to simplify the constructed tree actually doesn't have to do anything to get the simplified expression at all - it's already there. (My statement about the loop just outputting the result isn't true, that's not needed either, after all!) Colin's solution, for example, just walks the tree and builds a copy of it. But I'm not convinced that's necessary - you could just return the tree without a copy. (I went ahead and returned that tree in a string representation, because that made it easier for me to test things, and I could do that step quickly.)
[The version I posted had a error in the string representation of the result of converting right associativity to left (teaches me to code while battling a mild case of insomnia...). I found and fixed that while cleaning up the code.]
It would probably be more interesting to include the code that constructs the expression tree in the timing loop. You may very well have had that in mind, but that's not what I'd seen tested.
 Signature Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
Steve Wampler - 13 Dec 2007 22:10 GMT > Since the simplification can be done when that hard-coded parse tree > is built, the timing loop to simplify the constructed tree actually > doesn't have to do anything to get the simplified expression at all - > it's already there. Incidently, I think Java figured that out also. Here are the times I get for 10 runs of 2,000,000,000:
->time java -server Simplify 2000000000 10 2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.01 2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0 2000000000 '[* x [+ [+ [* 12 0][+ 23 8]] y ]]' -> '[* x [+ 31 y]]':: 0.0010 java -server Simplify 2000000000 10 0.11s user 0.02s system 102% cpu 0.127 total ->
I suspect that means it's figured out I have a loop invariant. Apparently it has more trouble figuring that out when you actually do something. Probably because new objects are built while 'cloning' the tree.
 Signature Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
Roedy Green - 13 Dec 2007 21:04 GMT > high allocation rates for short-lived objects when >written in an idiomatic functional style and Java should be uniquely bad at >this. Not at all. Java can allocate objects very quickly;, it just peels them off a giant contiguous block. What takes the time is when you have a large number of long lived objects since they have to be enumerated each GC pass. Short lived objects are recovered in zero time.
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Jon Harrop - 13 Dec 2007 23:27 GMT >> high allocation rates for short-lived objects when >> written in an idiomatic functional style and Java should be uniquely bad [quoted text clipped - 5 lines] > enumerated each GC pass. Short lived objects are recovered in zero > time. Then you'll be able to improve the Java implementations of these benchmarks so that they
|
|