Java Forum / General / January 2007
[Algorithm] Sum of Primes < 1000000
Luc The Perverse - 05 Jan 2007 10:50 GMT I was messing around on a site where you answer questions for points. (No prizes or anything)
Anyway, they say our programs *should* execute in less than 1 minute if well formed.
Mine didn't - it was less than 10 minutes though and I got the right answer.
I was supposed to sum all primes less than 1 million.
Is there an algorithm out there, which I will be able to understand, which works significantly faster than this one?
//import java.lang.*; import java.util.*; import java.math.*;
public class prob10{ public static void main(String[] x){ if(x.length<1){ System.out.println("enter a number"); return; }
int n = Integer.parseInt(x[0]); int c = 0; int[] pc = new int[n]; int[] p2 = new int[n]; if(n<=1){ System.out.println(2); return; } c++; pc[0] = 2; p2[0] = 2; int cn = 1; long g = 2; while(c<n){ cn+=2; boolean Prime = true; for(int i=1;i<c;i++){ p2[i] -= 2; if(p2[i] == 0) Prime = false; if(p2[i] < 1) p2[i] += pc[i]; } if(Prime){ pc[c] = cn; p2[c++] = cn; if((c%2048) == 0) System.out.println(cn); if(cn<1000000) g+=cn; else{ System.out.println("Yay 1 million reached!"); break; }
} } System.out.println("Grand Total " + g); System.out.println(pc[n-1]); }
}
I specifically tried to avoid ever dividing or modding (except in my status reporter, but that only checks when a prime has been found)
I "count down" until the next multiple a prime is detected.
The algorithm works . . but it is n^2 complexity, and kinda sucks.
 Signature LTP
:) Simon - 05 Jan 2007 11:58 GMT Luc The Perverse schrieb:
> I was messing around on a site where you answer questions for points. (No > prizes or anything) [quoted text clipped - 8 lines] > Is there an algorithm out there, which I will be able to understand, which > works significantly faster than this one? You may want to have a look at the Sieve of Eratosthenes (and its successors) (see http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes). It finds all primes up to n in time n log n. On my computer my implementation takes 150 milliseconds for n=1000000.
> I specifically tried to avoid ever dividing or modding (except in my status > reporter, but that only checks when a prime has been found) I doubt that such micro-optimisations are of much use here.
Cheers, Simon
Patricia Shanahan - 05 Jan 2007 20:23 GMT > Luc The Perverse schrieb: >> I was messing around on a site where you answer questions for points. (No [quoted text clipped - 14 lines] > to n in time n log n. On my computer my implementation takes 150 milliseconds > for n=1000000. As a matter of curiosity, did you use a boolean[] or BitSet?
I started with a boolean[], and mine was slower than yours, about 200 milliseconds. It dropped to under 100 milliseconds when I switched to BitSet. I believe the improvement is due to more efficient caching.
Patricia
Daniel Pitts - 05 Jan 2007 21:18 GMT > > Luc The Perverse schrieb: > >> I was messing around on a site where you answer questions for points. (No [quoted text clipped - 22 lines] > > Patricia I personally got 66ms from:
public class SumPrimes { public static void main(String...args) { final int bounds = Integer.parseInt(args[0]) + 1; final java.util.BitSet composits = new java.util.BitSet(bounds); long sum = 1; for (int prime = 2; prime < bounds; prime = composits.nextClearBit(prime +1)) { sum += prime; for (int composit = prime + prime; composit < bounds; composit += prime) { composits.set(composit); } } System.out.println(sum); } }
MikeNereson - 06 Jan 2007 15:04 GMT Your original alogorithm worked for me in 141ms. I made a couple seemingly minor changes. Can you find them? Why are they so important?
http://pastebin.com/852736
OUTPUT: Yay 1 million reached! Grand Total 797162 0 only took 141 ms
MikeNereson - 06 Jan 2007 15:52 GMT Whoops, I didn't validate that output. My bad. The bitset solution is great.
> Your original alogorithm worked for me in 141ms. I made a couple > seemingly minor changes. Can you find them? Why are they so important? [quoted text clipped - 6 lines] > 0 > only took 141 ms Chris Uppal - 06 Jan 2007 16:12 GMT > Yay 1 million reached! > Grand Total 797162 Seems unlikely to be the correct answer since 999983 is just one one of the primes in that range...
-- chris
Simon - 08 Jan 2007 09:08 GMT Patricia Shanahan schrieb:
> As a matter of curiosity, did you use a boolean[] or BitSet? > > I started with a boolean[], and mine was slower than yours, about 200 > milliseconds. It dropped to under 100 milliseconds when I switched to > BitSet. I believe the improvement is due to more efficient caching. Well, that is strange. I have been using a boolean[], and using BitSet actually slows it down on my machine. I am using a Laptop, so CPU speed may vary. Today it is even faster than yesterday. The output was (code see below):
Time : 42ms Sum : 37551402022
Apart from that, from your other post I saw that you have used a different argument to obtain the n log n upper bound.
- Your argument: There are only log n primes below n and each iteration takes time at most n. - My argument: Consider the iteration in which we erase prime i. We erase n/i composite numbers. In total, we need time
sum(i=1..n) n/i = n * H(n) = O(n * log n)
(Here, H(n) is the n-th harmonic number.) Can we combine both arguments to obtain an upper bound below n * log n? In my sum, only log n terms are actually necessary, as has been argued by Patricia. Unfortunately, the terms that are actually present are the larger ones (those for small i). Still, some Googling reveals that it is possible to get the time down to n * log log n.
Cheers, Simon
This is my code:
public class Primes {
private boolean[] numbers; private long sum = 0;
public Primes(int size) { numbers = new boolean[size]; }
public void eraseComposite() { int prime = 2; while (prime < numbers.length) { sum += prime; // mark multiples of prime for (int i = 2 * prime; i < numbers.length; i += prime) { // (we could start with prime*prime here, but I didn't bother // since it would be larger than Integer.MAX_INTEGER) numbers[i] = true; } // move forward to next prime prime++; while (prime < numbers.length-1 && (numbers[prime])) { prime++; } } }
public static void main(String[] argv) { long t = System.currentTimeMillis(); Primes p = new Primes(1000000); p.eraseComposite(); System.out.println("Time : " + (System.currentTimeMillis() - t) + "ms"); System.out.println("Sum : " + p.sum); } }
Patricia Shanahan - 08 Jan 2007 13:34 GMT > Patricia Shanahan schrieb: >> As a matter of curiosity, did you use a boolean[] or BitSet? [quoted text clipped - 6 lines] > slows it down on my machine. I am using a Laptop, so CPU speed may vary. Today > it is even faster than yesterday. The output was (code see below): The most likely cause of the speed up I saw is more efficient caching, because BitSet stores the data more densely. That is a very hardware dependent effect, though less a matter of CPU speed than of cache configuration.
> public void eraseComposite() { > int prime = 2; [quoted text clipped - 13 lines] > } > } There is one optimization you missed. When prime is greater than the square root of numbers.length, there is no need to do the composite scan.
If a number less than 1,000,000 is composite, at least one prime factor is less than 1000.
Patricia
Simon - 08 Jan 2007 14:05 GMT Patricia Shanahan schrieb:
>> Patricia Shanahan schrieb: >>> As a matter of curiosity, did you use a boolean[] or BitSet? [quoted text clipped - 38 lines] > If a number less than 1,000,000 is composite, at least one prime factor > is less than 1000. This is what I tried to express in the above comment. It was more laziness than the reason stated above though that prevented me from implementing it :-)
Cheers, Simon
Patricia Shanahan - 08 Jan 2007 14:25 GMT > Patricia Shanahan schrieb: >>> Patricia Shanahan schrieb: [quoted text clipped - 43 lines] > Cheers, > Simon Ah, I see. You were thinking of checking by asking whether the square of the latest prime is less than numbers.length.
The way I did it, there is no possibility of overflow - I stored Math.sqrt(numbers.length) as an integer. Inside the loop, I just compared to the square root.
Patricia
John Ersatznom - 08 Jan 2007 21:34 GMT So to make a long story short, the algorithm is:
* Make array of booleans of length max_num - 1, whose zeroth element corresponds to 2 and whose max_num - 2nd element corresponds to maxnum. Clear it (all false). (Java clears it for you if you use Java.) * Take square root of maxnum and store it somewhere. * for i = 2 to that square root * if (array[i - 2]) continue; * for j = i*i to max_num step i * set array[j - 2] * end * end
Note the optimization of starting the inner loop at i*i. All the composites smaller than i*i have a prime factor smaller than i and so are already marked at that point. Note also skipping the inner loop for composite values of i.
In fact, using the square root of max_num is a less significant optimization after the i*i optimization, as it just stops the outer loop testing every remaining value for compositeness and then entering the inner loop for the primes but without the inner loop actually doing anything (as i*i >= max_num the loop terminates immediately).
The algorithm does sqrt(max_num) bit tests (using the sqrt(max_num) opt) and sum{primes p up to sqrt(max_num}(max_num/p). The number of primes is O(log sqrt(max_num)) from what I've seen of primes, which is O(log max_num) since the log of the square root is simply half the log and the half disappears into the constant factor. The average prime from this set will be O(1) (integral of log x is log x - x, approximates sum of primes; divide by log x for 1 - 1/log(x) is O(1)). So the number of bit sets is O(max_num log max_num) -- O(max_num) per prime as it's max_num/p with average p O(1), and O(log max_num) primes.
So the whole thing has complexity O(n log n) -- as good as quicksort, and without the quadratic worst-case behavior. :)
Removing the squaring/square rooting optimizations doesn't increase the complexity but does increase the running time by a factor of 2 or so. Note that square roots are expensive, but the only square root occurs outside any loops...i*i is just a multiplication, and multiplication of integers is cheap on any modern architecture. The inner loop does one imul, three iadds (one's actually the subtract in the loop test and one computes the array index), one icmp (the rest of the loop test), and one boolean assignment. I expect the compiler to optimize "set array[j - 2]" to "*(foo + j) = true" or something equivalent -- i.e. to use a point two before the actual start of the array as the base for the pointer adds in this loop. Of course, using a BitSet instead of a boolean[] may change this a lot. A sensible implementation plus jit compiler will probably reduce it to a similar add and a bit mask operation, but I expect an idiv will sneak in there to get the array index into the underlying array, which may make BitSet worse-performing, and the subtraction of 2 definitely has to be done separately. Likely *(foo + (j - 2)/64) |= 2^(j - 2) under the hood with 2^(j - 2) sucked out of a lookup table, for a grand total of two extra iadds (j - 2 and the lookup into the 2^n table), one extra idiv, and two extra fetches (the lookup into the 2^n table and retrieving the stored word to |= with something) versus storing in a boolean[] which is probably the one add and a flat store, without bothering to retrieve whatever was in the array cell originally.
I'd be *very* surprised if BitSet actually comes out faster after you run it a few hundred times to get it all JITted and then a few hundred more to measure and average.
Richard F.L.R.Snashall - 08 Jan 2007 21:54 GMT > So to make a long story short, the algorithm is: > [quoted text clipped - 13 lines] > are already marked at that point. Note also skipping the inner loop for > composite values of i. What if you used your array to find "open" values at i and above and considered only those multiples of i?
Patricia Shanahan - 08 Jan 2007 23:13 GMT ...
> I'd be *very* surprised if BitSet actually comes out faster after you > run it a few hundred times to get it all JITted and then a few hundred > more to measure and average. The potential gain from BitSet is more a matter of hardware than anything JIT can affect.
Especially in the early stages, the sieve has a very cache-unfavorable memory access pattern, relatively small stride access to a range of a million elements.
If you use a boolean[], those million elements occupy a million bytes of memory, because, at least in the implementations I've examined, each boolean in an array gets its own byte.
BitSet uses a long[], and packs 64 bits into each element, so a million elements only take 125,000 bytes of memory.
That may make a significant difference if at least one cache in the system has a size in the range of tens to hundreds of megabytes.
Patricia
Chris Uppal - 09 Jan 2007 19:09 GMT > If you use a boolean[], those million elements occupy a million bytes of > memory, because, at least in the implementations I've examined, each [quoted text clipped - 5 lines] > That may make a significant difference if at least one cache in the > system has a size in the range of tens to hundreds of megabytes. I think you must have meant that last word to be kilobytes ?
-- chris
Patricia Shanahan - 15 Jan 2007 15:52 GMT >> If you use a boolean[], those million elements occupy a million bytes of >> memory, because, at least in the implementations I've examined, each [quoted text clipped - 7 lines] > > I think you must have meant that last word to be kilobytes ? Thanks for the correction.
Patricia
Simon - 09 Jan 2007 08:14 GMT > The algorithm does sqrt(max_num) bit tests (using the sqrt(max_num) opt) > and sum{primes p up to sqrt(max_num}(max_num/p). The number of primes is [quoted text clipped - 8 lines] > So the whole thing has complexity O(n log n) -- as good as quicksort, > and without the quadratic worst-case behavior. :) I'm not sure what you are saying. We have already seen two arguments for an upper bound of O(n*log n). Is that a third argument? In particular, I don't understand what you mean by "The average prime from this set will be O(1)". (Every set of (distinct) positive integers (prime or not) with at least k elements has an average of at least k/4 since at least k/2 of its elements are
>= k/2.). Furthermore, the integral of log x is not lox x - x, but rather x * log x - x. Then, you seem to have this number p in the denominator (in max_num/p) and use p=O(1) to obtain an upper bound on the term. This, however, is impossible. You would actually need it to be p=Omega(1) (which is obviously true in contrast to the O(1)). Also, you cannot just take the average and then multiply with that as if every element would equal the average.
Or are you trying to give a lower bound? This should also be impossible: http://wwwhomes.uni-bielefeld.de/achim/prime_sieve.html claims that the truth is O(n * log log n) which I tend to believe as I already posted (There are only log n terms in the harmonic series).
Cheers, Simon
John Ersatznom - 15 Jan 2007 14:52 GMT > I'm not sure what you are saying. Then you didn't read it and understand it. Please try again, or give up, but there's really no need to attack.
Lew - 15 Jan 2007 15:15 GMT Simon wrote:
>> I'm not sure what you are saying.
> Then you didn't read it and understand it. Please try again, or give up, > but there's really no need to attack. I saw no attack in Simon's post. Did I miss something?
Simon raised some interesting mathematically-motivated points. I am in no position yet to judge the relative merit of the arguments, but I was awaiting with some interest your response to Simon, on the assumption that you were going to continue the mathematical reasoning as you clarified the points about which Simon asked.
I continue to hope for such information.
- Lew
John Ersatznom - 16 Jan 2007 16:02 GMT >> Then you didn't read it and understand it. Please try again, or give >> up, but there's really no need to attack. > > I saw no attack in Simon's post. Did I miss something? Yes; the fact that he was accusing me of being bad at math.
> Simon raised some interesting mathematically-motivated points. Well, to start with, he basically accused Wikipedia of having the integral of ln(x) wrong. He also correctly stated that an average of a set exceeds 1/4 the size of the set, but thereby was trying to imply that the optimization sucked because something was therefore O(N) in the total number range instead of O(log N). The set was of primes, and the size of that is O(log N), so 1/4 of that is O(log N). He actually ends up proving the weak result that the size of the average prime less than N is *at least* O(log N) (not ruling out O(N) or whatever); I had gotten the stronger result that the size of the average prime *was* only O(log N).
The reasoning in a nutshell:
Algorithm to build sieve up to N as I stated it has three parts. Loop until sqrt(N) (O(sqrt(N)). For each p reached that's not gotten marked composite earlier, loop from p^2 until N step p. The number of these loops is O(P), P the number of primes up to sqrt(N). The length of a given one is O(N/p). Turns out P is O(log(sqrt(N)) = O(1/2 log(N)) = O(log(N)), so this loop is O(log(N)*X) where the average length of a loop is O(X). What is the average length of a loop?
It's O(N/p) of course, for the average choice of p. Primes are logarithmically distributed (P is O(log(N)) above) so the average p is going to be small fairly independently of the range. It's hard to be sure it's O(1) though. The average is the sum divided by the total number. The sum of the primes, the way they are logarithmically distributed, could be approximated by the integral of the logarithm. (It might actually be better now I think of it to integrate N/p -- N*(1/2 + 1/3 + 1/5...) looks like N times an exponential decay function, whose integral is itself and which is clearly O(1))
I concluded that the average length of the loop is bounded by O(N), and the whole algorithm by O(N log N). If it's actually N log log N or can be made such, so much the better.
Simon - 16 Jan 2007 17:24 GMT >>> Then you didn't read it and understand it. Please try again, or give >>> up, but there's really no need to attack. >> >> I saw no attack in Simon's post. Did I miss something? > > Yes; the fact that he was accusing me of being bad at math. I wasn't accusing you, I was analysing your arguments.
>> Simon raised some interesting mathematically-motivated points. > > Well, to start with, he basically accused Wikipedia of having the > integral of ln(x) wrong. - You said: "integral of log x is log x - x" - Wikipedia correctly says (http://en.wikipedia.org/wiki/List_of_integrals_of_logarithmic_functions) "int ln(cx) dx = x * ln (cx) - x" For our case c=1, this is the same as I said ("Furthermore, the integral of log x is not lox x - x, but rather x * log x - x. )
> He also correctly stated that an average of a set exceeds 1/4 the size > of the set, but thereby was trying to imply that the optimization sucked I didn't say that anything sucked. What optimisation? Starting with the square? We all know that this does not suck. It is standard.
> because something was therefore O(N) in the total number range instead > of O(log N). I was saying "something was O(N)"? What is something? Do you mean Omega(N)?
> The set was of primes, and the size of that is O(log N), so > 1/4 of that is O(log N). The average over any set of positive integers of cardinality k is at least Omega(k). You said: "The average prime from this set will be O(1)". This is obviously wrong unless the set contains only O(1) elements.
> He actually ends up proving the weak result > that the size of the average prime less than N is *at least* O(log N) > (not ruling out O(N) or whatever); That wasn't a weaker result, it was a counter argument to your bogus claim. Furthermore, something cannot be *at least* O(log N). It can only be *at least* Omega(log N). You should really learn the terms.
> I had gotten the stronger result that > the size of the average prime *was* only O(log N). You are saying the average prime below N is O(log N)? There are O(log N) primes below N. At least one of these primes is at least N/2. Even if we ignore the contribution of all other elements in this set, this single prime alone implies an average of at least Omega(n/log n).
I have the impression that you are trying to look at the set {1,...,N} and take the average only over the primes in this set, whatever that means. If that is what you are trying to do, you really should write that down more precisely. Still, it would be unclear what this would contribute to your reasoning.
> The reasoning in a nutshell: > [quoted text clipped - 7 lines] > > It's O(N/p) of course, for the average choice of p. Note: You cannot just argue over averages and then multiply with the number of iterations. In general, such arguments fail.
> Primes are > logarithmically distributed (P is O(log(N)) above) so the average p is [quoted text clipped - 5 lines] > 1/3 + 1/5...) looks like N times an exponential decay function, whose > integral is itself and which is clearly O(1)) Clearly?
1/2 + 1/3 + 1/5 + 1/7 + 1/11 ...
is O(1)? You should read: http://mathworld.wolfram.com/HarmonicSeriesofPrimes.html
It approaches ln ln N.
> I concluded that the average length of the loop is bounded by O(N), You need this long argument to conclude that the average length of a loop that starts above 1, has a step size of more than 1 and ends at N is bounded by O(N)? Wow.
> and > the whole algorithm by O(N log N). If it's actually N log log N or can > be made such, so much the better. It's a bit hard to understand why you need so much reasoning for this simple bound for which already two independent arguments (each of which fit into two lines) have been given.
BTW: Now that we know that the harmonic series of primes grows as O(ln ln N), this proves that the total time is O(N * log log N).
Cheers, Simon
Simon - 17 Jan 2007 07:32 GMT >> I had gotten the stronger result that >> the size of the average prime *was* only O(log N). [quoted text clipped - 3 lines] > Even if we ignore the contribution of all other elements in this set, this > single prime alone implies an average of at least Omega(n/log n). Forget this argument, it is nonsense :-) This one is better:
There are Omega(N/log N) primes between N/2 and N each of which has size at least N/2. Furthermore, there are O(N/log N) primes below N/2, so the average is at least
c1 * N/2 * N / log N ------------------- = N * c1/(2*c2) c2 * N/log N
for suitable constants c1 and c2.
Cheers, Simon
John Ersatznom - 18 Jan 2007 22:26 GMT [snip most]
> That wasn't a weaker result, it was a counter argument to your bogus claim. I'm not going to argue with you if all you're going to do is sit here and insult me.
[Remainder of condescending BS snipped]
Good day.
Simon - 15 Jan 2007 16:51 GMT John Ersatznom schrieb:
>> I'm not sure what you are saying. > > Then you didn't read it Yes, I did.
> and understand it. No, I didn't. That's what I'm trying to say.
> Please try again, or give up, > but there's really no need to attack. I apologise for anything which you might have interpreted as an attack, which it wasn't intended to be. I was trying to be precise as to which arguments of yours I think are questionable. If you think I got you wrong, please explain. If you think that my arguments are wrong, please try to be precise also, so I can clarify.
Cheers, Simon
John Ersatznom - 16 Jan 2007 16:05 GMT >>and understand it. > > No, I didn't. That's what I'm trying to say. Then you either try again or give up -- what more is there for me to say? You might also examine my other response to this thread today. In any event, there is no call for going around publicly suggesting that anyone else's arguments are "questionable" or similarly just because you didn't happen to understand them.
Curiously, my earlier post to this thread (that you'd responded to) is gone. My newsserver claims a 15 day retention and it's only a week old, which means somebody forged a cancel. It wasn't you, I hope. (If it was, it's ironic that you quoted much of it in your reply.)
Simon - 16 Jan 2007 17:27 GMT > Curiously, my earlier post to this thread (that you'd responded to) is > gone. My newsserver claims a 15 day retention and it's only a week old, > which means somebody forged a cancel. It wasn't you, I hope. (If it was, > it's ironic that you quoted much of it in your reply.) You are suspect me of cancelling your article? Now it's getting weird. Why should I do that? It is still available on my newsserver, as it is on Google groups.
Cheers, Simon
Lew - 16 Jan 2007 22:40 GMT >>> and understand it. Simon wrote:
>> No, I didn't. That's what I'm trying to say.
> Then you either try again or give up -- what more is there for me to > say? You might also examine my other response to this thread today. In > any event, there is no call for going around publicly suggesting that > anyone else's arguments are "questionable" or similarly just because you > didn't happen to understand them. Isn't it fair to say that academic discourse is only possible when at least one party calls another's reasoning into question? Otherwise it's merely lecturing. If you aren't ready to have your ideas questioned, it is not yet time to reveal them.
As to understanding, Simon's posts indicate at least a firm general understanding of your comments, so lack of expertise seems not to be an issue. His specific points may be wrong, and he did have specific questions, but his arguments are not so easily dismissed by a blanket condemnation of his maths chops. You should examine his claims on a peer level; the ad hominem approach will not be effective.
I am here to say that both of your arguments are questionable. But I'm only saying that because so far I don't happen to understand them completely.
- Lew
John Ersatznom - 18 Jan 2007 22:27 GMT > As to understanding, Simon's posts indicate at least a firm general > understanding of your comments, so lack of expertise seems not to be an > issue. His specific points may be wrong, and he did have specific > questions, but his arguments are not so easily dismissed by a blanket > condemnation of his maths chops... The simple fact that he's disputing what I said instead of nodding and moving on is evidence enough that he didn't properly understand what I wrote.
Lew - 18 Jan 2007 23:18 GMT >> As to understanding, Simon's posts indicate at least a firm general >> understanding of your comments, so lack of expertise seems not to be >> an issue. His specific points may be wrong, and he did have specific >> questions, but his arguments are not so easily dismissed by a blanket >> condemnation of his maths chops...
> The simple fact that he's disputing what I said instead of nodding and > moving on is evidence enough that he didn't properly understand what I > wrote. Of course it is.
- Lew
Jean-Francois Briere - 08 Jan 2007 23:14 GMT The sum should be 37550402023 and not 37551402022 (Your're adding 999999 by error). Your code should read:
// move forward to next prime prime++; while (prime < numbers.length && (numbers[prime])) { prime++; }
Regards
Simon - 09 Jan 2007 08:16 GMT > The sum should be 37550402023 and not 37551402022 > (Your're adding 999999 by error). [quoted text clipped - 7 lines] > > Regards Oops. Thanks for the correction. I think the "-1" was a relict from a version where I tested numbes[prime+1] or something. Actually this loop is pretty weird. It should be a do-loop instead.
Cheers, Simon
Chris Uppal - 16 Jan 2007 21:28 GMT [Since this thread seems to have come alive again...]
> Apart from that, from your other post I saw that you have used a different > argument to obtain the n log n upper bound. > > - Your argument: There are only log n primes below n and each iteration > takes time at most n. I can't find the post where that argument is made, but I think it's wrong; there are O(n/log(n)) primes up to n which would give an estimate of O(n^2 / log(n))
> - My argument: Consider the iteration in which we erase prime i. We erase > n/i composite numbers. In total, we need time [quoted text clipped - 5 lines] > (Here, H(n) is the n-th harmonic number.) Can we combine both arguments to > obtain an upper bound below n * log n? You can weight the terms in the sum by the probability that each i is prime. That (as I understand it) is approximated by 1/log(i).
Also you can reduce the upper bound on i to sqrt(n), which only affects the constant factors in your formula, but might not "vanish" in the same way with the weighting applied.
> Still, some Googling reveals that it is possible to get the time down to > n * log log n. Sadly, I have no idea whether the weighted sum agument yeilds that formula...
-- chris
Simon - 17 Jan 2007 07:13 GMT Chris Uppal schrieb:
> [Since this thread seems to have come alive again...] > [quoted text clipped - 7 lines] > there are O(n/log(n)) primes up to n which would give an estimate of O(n^2 / > log(n)) You are right, I misread Patricias post. She wrote: "I was not allowing for only doing the inner loop for primes." so I asked myself how this can help to get the bound down from O(n^2) to O(n log n). That would work if the number of primes was O(log n), so I prematurely thought that might be Patricias argument. But it is, of course, nonsense. Sorry.
Cheers, Simon
Patricia Shanahan - 09 Jan 2007 14:39 GMT > Luc The Perverse schrieb: >> I was messing around on a site where you answer questions for points. (No [quoted text clipped - 4 lines] >> >> Mine didn't - it was less than 10 minutes though and I got the right answer. ...
>> I specifically tried to avoid ever dividing or modding (except in my status >> reporter, but that only checks when a prime has been found) > > I doubt that such micro-optimisations are of much use here. This, I think, is the big message from this thread. Simply implementing a very well-known algorithm changed the performance from somewhere between 1 minute and 10 minutes to a fraction of a second. Tweaking the code got a further factor of about 3.
It is very, very unlikely that any amount of micro-optimization applied to the original algorithm could have got the performance of a simple, direct sieve implementation.
Patricia
Lew - 10 Jan 2007 00:15 GMT > This, I think, is the big message from this thread. Simply implementing > a very well-known algorithm changed the performance from somewhere [quoted text clipped - 4 lines] > to the original algorithm could have got the performance of a simple, > direct sieve implementation. Some years back I tried to make this point to my project manager on a contract I was working. The optimization there was of a database structure, where the difference for one report was between O(n^2) and O(n). I was forbidden to make the algorithmic optimization, then got in trouble because the report was too slow once they loaded all the data. It took months and lawyers to straighten that one out.
I don't know why so many project managers seem to ignore this kind of reasoning. Most are not as egregious, but the problem remains. Not only do we as engineers need to know about the big O, but we have a fiduciary responsibility to communicate such reasoning to our customers (including employers).
- Lew
Luc The Perverse - 10 Jan 2007 04:25 GMT >> This, I think, is the big message from this thread. Simply implementing >> a very well-known algorithm changed the performance from somewhere [quoted text clipped - 11 lines] > trouble because the report was too slow once they loaded all the data. It > took months and lawyers to straighten that one out. ROFL - that is hillarious!
There is a sublime sense of irony when stupid ideas blow up in leaders faces. It is usually clouded over by the problem itself though and/or them yelling at you.
> I don't know why so many project managers seem to ignore this kind of > reasoning. Most are not as egregious, but the problem remains. Not only do > we as engineers need to know about the big O, but we have a fiduciary > responsibility to communicate such reasoning to our customers (including > employers). I had a unique method for dealing with this - I would generate a test case which would cause unreasonably bad results and "demonstrate" it. Of course, I would have the entire test case suite available, and a proposed fix with a sample demonstration and presentation. Of course, this may not be as practical for optimizations which take a great deal of time . . or other environments but it worked for me.
It was my reasoning that if I could not come up with a test case which caused undesirable results then I didn't need a workaround. -- LTP
:) Philipp - 10 Jan 2007 07:54 GMT >> This, I think, is the big message from this thread. Simply implementing >> a very well-known algorithm changed the performance from somewhere [quoted text clipped - 17 lines] > responsibility to communicate such reasoning to our customers (including > employers). Big O is just how the system scales, not the absolute running time. If you have an amount of entries which will not vary by a factor of 10 in a close future, it would be wise to check if the O(n^2) is not much faster than the O(n) algorithm on these entries...
Patricia Shanahan - 15 Jan 2007 16:04 GMT >>> This, I think, is the big message from this thread. Simply implementing >>> a very well-known algorithm changed the performance from somewhere [quoted text clipped - 22 lines] > close future, it would be wise to check if the O(n^2) is not much faster > than the O(n) algorithm on these entries... I did that once, and created a lot of trouble. It was a system for managing and applying patch cards when booting an operating system. I was told that there would never be more than order tens of cards at a time, before the OS would be recompiled with source code changes.
I did a very simple implementation that I knew was O(n^2), but that I had tested for a couple of hundred cards, several times the number required by the specification, and at that size it ran about as fast as the card reader.
Over the years, the number of cards increased to about a thousand, and it became a real bottleneck. I tried to explain the issue, and get permission to rework the code to use e.g. a tree structure or a hash table instead of a linear search...
A lot depends on your confidence that reimplementation will be permitted if the problem outgrows the O(n^n) solution.
Patricia
Lew - 15 Jan 2007 16:23 GMT > I did that once, and created a lot of trouble. It was a system for > managing and applying patch cards when booting an operating system. I [quoted text clipped - 13 lines] > A lot depends on your confidence that reimplementation will be permitted > if the problem outgrows the O(n^n) solution. Over the last thirty or so years, I have seen many projects where we were told "we can always refactor it later", but later were told, "we are not going to refactor given the investment in the existing codebase, besides it works, doesn't it?"
As a practical matter it is almost impossible to get management buy-in to throw away existing code no matter how terrible. One worst case scenario was a nine-month project that had cost my employer nearly a million dollars to a cosultant, and still didn't work - it gave wrong results and crashed the host. (The consultant was known in our trenches to be a scammer, but we never could convince management of that.) I stepped in, took two days and about 150 lines of code (including blank lines and comments) and created a complete, bulletproof solution. (Not actually a difficult project, you see - the consultant was a scammer.) I nearly got in trouble, since I had completely disregarded the nine months / million dollars worth of existing code that didn't work. As it was, I was the focus of a lot of managerial concern. (They always looked so sad when I told them, "Port Steve's code? I never even looked at Steve's code; I just rewrote the application from the specification.") The investors pulled out all their money from the company two weeks later (possibly for unrelated reasons). Welcome to the future.
There has to be an awful lot of pain before there is buy-in for refactoring, and often not even then.
- Lew
John Ersatznom - 16 Jan 2007 16:11 GMT >> A lot depends on your confidence that reimplementation will be permitted >> if the problem outgrows the O(n^n) solution. [quoted text clipped - 13 lines] > There has to be an awful lot of pain before there is buy-in for > refactoring, and often not even then. Wash your hands of it. If management says "we can always refactor it later" get that in writing or on tape. If later it won't scale, point out the earlier discussion (for which you have documentation). Also note that they did invest money in the existing code, and they then got however-many years of use out of it before their needs outgrew that code. Replacing it isn't throwing away the investment, it's making a new investment after an earlier one has paid off about all it's going to.
If they still don't see the light, wash your hands of it. Say "OK" and do whatever. If the company goes kaput, gets scammed and goes kaput, or whatever, well, you did your duty of warning them and they did their duty of actually making the call. They got it wrong but not for lack of information; it's not your fault. If you have the skills, you can find another job and maybe one where your expertise will be better appreciated. As for the company that goes under, well, that's Darwinism at work. If the management can't find new work after it does, well, that's probably also Darwinism at work. ;)
Chris Uppal - 10 Jan 2007 14:34 GMT > The optimization there was of a database > structure, where the difference for one report was between O(n^2) and > O(n). I was forbidden to make the algorithmic optimization, then got in > trouble because the report was too slow once they loaded all the data. It > took months and lawyers to straighten that one out. I think part of the problem with this sort of situation is that we don't have a commonly understood way of adding contingent sub-plans to projects. It seems quite plausible to me that your manager might have correctly decided that the risk and time involved in the optimisation was not acceptable at that time. But ideally you and s/he should have been able to add a contingent extension to the project plan, somewhat like (briefly)
Risk: The XYZ report may run too slowly with production-scale datasets. Reason: O(N^2) algorithm used (unecessarily). Responsibility for monitoring and reviewing risk: <some person, group, or organisation> at <some time>. Suggested fix: <details> Probable time: <N weeks> + testing. NB, retesting required /at system integration scope/.
The "Responsibility for monitoring and reviewing risk:" bit would be, to my mind, the single most important bit of this in that the system couldn't be deemed complete unless the issue had been resolved.
At least, /I/ have never seen anything like this used in project planning (not even informally).
-- chris
Lew - 11 Jan 2007 06:00 GMT Lew wrote:
>> The optimization there was of a database >> structure, where the difference for one report was between O(n^2) and [quoted text clipped - 6 lines] > quite plausible to me that your manager might have correctly decided that the > risk and time involved in the optimisation was not acceptable at that time. The fix involved half a day's work and was routine - no risk.
> But ideally you and s/he should have been able to add a contingent extension to > the project plan, somewhat like (briefly) > > Risk: > The XYZ report may run too slowly with production-scale datasets. There was no doubt that it would, given the projected data sizes.
> Reason: > O(N^2) algorithm used (unecessarily). [quoted text clipped - 5 lines] > <N weeks> + testing. > NB, retesting required /at system integration scope/. No such reasoned approach was forthcoming from my project manager then. He simply didn't want to hear it. The difference in that case was not weeks but a few hours of development time to implement the fix.
> The "Responsibility for monitoring and reviewing risk:" bit would be, to my > mind, the single most important bit of this in that the system couldn't be > deemed complete unless the issue had been resolved. I had a fix ready to implement. I made sure my replacement knew of it. When he in turn was finally permitted to make the change, it took half a day and completely fixed the problem. That didn't happen until the project manager who had forbidden the change was replaced. By the guy I'd trained with the fix.
> At least, /I/ have never seen anything like this used in project planning (not > even informally). Not sure of the antecedent. If you mean you've never seen someone irrationally and arbitrarily refuse to let a product work when it would have been easy to repair, lucky you. If you mean you've never seen anyone plan for such a fix, using reasoned analysis and with fallback positions, unlucky you. I have seen both ends of that spectrum.
- Lew
Chris Uppal - 11 Jan 2007 20:41 GMT [me:]
> > I think part of the problem with this sort of situation is that we > > don't have a commonly understood way of adding contingent sub-plans to [quoted text clipped - 3 lines] > > The fix involved half a day's work and was routine - no risk. /Nothing/ involves no risk. But yes, given your estimate, the manager's position does sound a little, err, irrational...
> > But ideally you and s/he should have been able to add a contingent > > extension to the project plan, somewhat like (briefly) ...
> > At least, /I/ have never seen anything like this used in project > > planning (not even informally). > > Not sure of the antecedent. If you mean you've never seen someone > irrationally and arbitrarily refuse to let a product work when it would > have been easy to repair, lucky you. Oddly, enough, I don't think I've ever been in the position where "management" hasn't heeded my warnings on specific technical issues ("heeded" != "agreed with after due consideration"). I /have/ been in the position of saying something like "but this plan only covers the functionality we actually know about now; there's at least that much again, /really important/ stuff, which we haven't even started to think about" -- naturally I was roundly criticised for "being negative"...
> If you mean you've never seen anyone > plan for such a fix, using reasoned analysis and with fallback positions, > unlucky you. I have seen both ends of that spectrum. That's what I meant. Of course I have seen plans changed to allow for problems as they arise, but I haven't seen anyone using a planning idiom to postpone addressing, but still properly manage, potential risks as they are identified.
-- chris
Luc The Perverse - 11 Jan 2007 21:37 GMT > Oddly, enough, I don't think I've ever been in the position where > "management" [quoted text clipped - 8 lines] > for > "being negative"... In general I would complain about my situation, a programmer being subjected to menial work - but one benefit does come out of it.
It is very easy to quit your job on principle, if you are making less than 10$ (UD)/ hr
-- LTP
:) Daniel Pitts - 12 Jan 2007 01:17 GMT > In general I would complain about my situation, a programmer being subjected > to menial work - but one benefit does come out of it. > > It is very easy to quit your job on principle, if you are making less than > 10$ (UD)/ hr Hmm, thats extremely low wage for a programmer. Actually, thats about what a Starbucks Barista makes in San Fransisco. You'd be better off working in a coffee shop. At least then you could get free caffein!
Luc The Perverse - 12 Jan 2007 02:27 GMT >> In general I would complain about my situation, a programmer being >> subjected [quoted text clipped - 7 lines] > You'd be better off working in a coffee shop. At least then you could > get free caffein! They're called internships - what you take when you can't get real jobs . . . And then intermittent manual labor jobs when you can't find an internship.
Keep in mind that the cost of living is MUCH higher in San Francisco than here in BFE.
-- LTP
:) Daniel Pitts - 12 Jan 2007 19:35 GMT > >> In general I would complain about my situation, a programmer being > >> subjected [quoted text clipped - 13 lines] > Keep in mind that the cost of living is MUCH higher in San Francisco than > here in BFE. That is a valid point. Ofcourse, you could get an internship in S.F. for $20/hr :-)
Patricia Shanahan - 05 Jan 2007 12:01 GMT > I was messing around on a site where you answer questions for points. (No > prizes or anything) [quoted text clipped - 8 lines] > Is there an algorithm out there, which I will be able to understand, which > works significantly faster than this one? ...
The usual algorithm is the Sieve of Eratosthenes, http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
The sieve is O(n*n), but with small constant factor.
Patricia
Patricia Shanahan - 05 Jan 2007 12:10 GMT >> I was messing around on a site where you answer questions for points. >> (No prizes or anything) [quoted text clipped - 15 lines] > > The sieve is O(n*n), but with small constant factor. I was wrong about the complexity. I was not allowing for only doing the inner loop for primes.
Patricia
Luc The Perverse - 05 Jan 2007 19:58 GMT >>> I was messing around on a site where you answer questions for points. >>> (No prizes or anything) [quoted text clipped - 18 lines] > I was wrong about the complexity. I was not allowing for only doing the > inner loop for primes. Looks like I had better check this out!
hehe cool
Thanks guys! (And I mean Simon too)
-- LTP
:) Abhi - 08 Jan 2007 09:04 GMT Hi Luc, I found the primes<1000000 and stored them in an array.also summed them up.I am pasting my code.Pls hav a look..........it took 1782 msecs on a p4 2.4GHz processor && 512 mb ram. If this code is not ok then let me know:)
/* * Created on Jan 7, 2007 * * To change the template for this generated file go to * Window>Preferences>Java>Code Generation>Code and Comments */ package src;
//import java.lang.Math; /** * @author Jboss *calculate sum of prime nos less than 1 million */ public final class Sum_Prime {
private static int currNo; private static int currSqrt; private static int[] primeArr; private static long currSum = 0; private static final int range = 1000000; // calc till 1000000; private static long startTime;
public static void main(String[] args) { //Math.sqrt() int currArrSize = range / 20;//holds the curr size of the array if (currArrSize == 0) { primeArr = new int[10]; currArrSize=10; } else primeArr = new int[currArrSize];
startTime = System.currentTimeMillis(); int k = 0; //index of the array which stores the prime nos.
currNo = 2; //start looking from 2 System.out.print("\n arr size \t"+primeArr.length); for (int i = 2; i <= range; i++) { boolean isPrime = true; currSqrt = (int) Math.sqrt(currNo); for (int j = 2; j <= currSqrt; j++) //chk for 3 whose sqrt is 1.5.... { if (currNo % j == 0) {
isPrime = false; break; } } if (isPrime) { try { primeArr[k] = currNo; currSum = currSum + currNo; k++; } catch (ArrayIndexOutOfBoundsException arrex) { currArrSize=currArrSize*2; int[] tempArr = new int[currArrSize]; System.arraycopy(primeArr, 0, tempArr, 0,k);//copies the old arr into new 1 primeArr=tempArr; //System.out.print("\n primeArr size \t"+primeArr.length); //System.out.print("\n tempArr size \t"+tempArr.length); primeArr[k]=currNo; currSum = currSum + currNo; System.out.print("\n new arr size \t"+primeArr.length); k++; }
} currNo++; } System.out.print("\n sum \t" + currSum); System.out.print("\n curr val \t" + currNo); System.out.print("\n nos found \t" + k); System.out.print( "\n Time taken to execute in millisecs \t" + (System.currentTimeMillis() - startTime)); System.exit(0); } } tell me the name of the site. where u saw this..........
> I was messing around on a site where you answer questions for points. (No > prizes or anything) [quoted text clipped - 71 lines] > > :)
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|