Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / January 2007

Tip: Looking for answers? Try searching our database.

[Algorithm] Sum of Primes < 1000000

Thread view: 
Luc The Perverse - 05 Jan 2007 10:50 GMT
I was messing around on a site where you answer questions for points. (No
prizes or anything)

Anyway, they say our programs *should* execute in less than 1 minute if well
formed.

Mine didn't - it was less than 10 minutes though and I got the right answer.

I was supposed to sum all primes less than 1 million.

Is there an algorithm out there, which I will be able to understand, which
works significantly faster than this one?

//import java.lang.*;
import java.util.*;
import java.math.*;

public class prob10{
public static void main(String[] x){
 if(x.length<1){
  System.out.println("enter a number");
  return;
 }

 int n = Integer.parseInt(x[0]);
 int c = 0;
 int[] pc = new int[n];
 int[] p2 = new int[n];
 if(n<=1){
  System.out.println(2);
   return;
}
 c++;
 pc[0] = 2;
 p2[0] = 2;
 int cn = 1;
 long g = 2;
 while(c<n){
  cn+=2;
  boolean Prime = true;
  for(int i=1;i<c;i++){
   p2[i] -= 2;
   if(p2[i] == 0)
    Prime = false;
   if(p2[i] < 1)
    p2[i] += pc[i];
  }
  if(Prime){
   pc[c] = cn;
   p2[c++] = cn;
   if((c%2048) == 0)
    System.out.println(cn);
   if(cn<1000000)
    g+=cn;
   else{
    System.out.println("Yay 1 million reached!");
    break;
   }

  }
 }
 System.out.println("Grand Total " + g);
 System.out.println(pc[n-1]);
}

}

I specifically tried to avoid ever dividing or modding (except in my status
reporter, but that only checks when a prime has been found)

I "count down" until the next multiple a prime is detected.

The algorithm works . . but it is n^2 complexity, and kinda sucks.

Signature

LTP

:)
Simon - 05 Jan 2007 11:58 GMT
Luc The Perverse schrieb:
> I was messing around on a site where you answer questions for points. (No
> prizes or anything)
[quoted text clipped - 8 lines]
> Is there an algorithm out there, which I will be able to understand, which
> works significantly faster than this one?

You may want to have a look at the Sieve of Eratosthenes (and its successors)
(see http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes). It finds all primes up
to n in time n log n. On my computer my implementation takes 150 milliseconds
for n=1000000.

> I specifically tried to avoid ever dividing or modding (except in my status
> reporter, but that only checks when a prime has been found)

I doubt that such micro-optimisations are of much use here.

Cheers,
Simon
Patricia Shanahan - 05 Jan 2007 20:23 GMT
> Luc The Perverse schrieb:
>> I was messing around on a site where you answer questions for points. (No
[quoted text clipped - 14 lines]
> to n in time n log n. On my computer my implementation takes 150 milliseconds
> for n=1000000.

As a matter of curiosity, did you use a boolean[] or BitSet?

I started with a boolean[], and mine was slower than yours, about 200
milliseconds. It dropped to under 100 milliseconds when I switched to
BitSet. I believe the improvement is due to more efficient caching.

Patricia
Daniel Pitts - 05 Jan 2007 21:18 GMT
> > Luc The Perverse schrieb:
> >> I was messing around on a site where you answer questions for points. (No
[quoted text clipped - 22 lines]
>
> Patricia

I personally got 66ms from:

public class SumPrimes {
   public static void main(String...args) {
       final int bounds = Integer.parseInt(args[0]) + 1;
       final java.util.BitSet composits = new
java.util.BitSet(bounds);
       long sum = 1;
       for (int prime = 2; prime < bounds; prime =
composits.nextClearBit(prime +1)) {
           sum += prime;
           for (int composit = prime + prime; composit < bounds;
composit += prime) {
               composits.set(composit);
           }
       }
       System.out.println(sum);
   }
}
MikeNereson - 06 Jan 2007 15:04 GMT
Your original alogorithm worked for me in 141ms. I made a couple
seemingly minor changes. Can you find them? Why are they so important?

http://pastebin.com/852736

OUTPUT:
Yay 1 million reached!
Grand Total 797162
0
only took 141 ms
MikeNereson - 06 Jan 2007 15:52 GMT
Whoops, I didn't validate that output. My bad. The bitset solution is
great.

> Your original alogorithm worked for me in 141ms. I made a couple
> seemingly minor changes. Can you find them? Why are they so important?
[quoted text clipped - 6 lines]
> 0
> only took 141 ms
Chris Uppal - 06 Jan 2007 16:12 GMT
> Yay 1 million reached!
> Grand Total 797162

Seems unlikely to be the correct answer since 999983 is just one one of the
primes in that range...

   -- chris
Simon - 08 Jan 2007 09:08 GMT
Patricia Shanahan schrieb:
> As a matter of curiosity, did you use a boolean[] or BitSet?
>
> I started with a boolean[], and mine was slower than yours, about 200
> milliseconds. It dropped to under 100 milliseconds when I switched to
> BitSet. I believe the improvement is due to more efficient caching.

Well, that is strange. I have been using a boolean[], and using BitSet actually
slows it down on my machine. I am using a Laptop, so CPU speed may vary. Today
it is even faster than yesterday. The output was (code see below):

Time : 42ms
Sum  : 37551402022

Apart from that, from your other post I saw that you have used a different
argument to obtain the n log n upper bound.

- Your argument: There are only log n primes below n and each iteration takes
time at most n.
- My argument: Consider the iteration in which we erase prime i. We erase n/i
composite numbers. In total, we need time

  sum(i=1..n)  n/i
= n * H(n)
= O(n * log n)

(Here, H(n) is the n-th harmonic number.) Can we combine both arguments to
obtain an upper bound below n * log n? In my sum, only log n terms are actually
necessary, as has been argued by Patricia. Unfortunately, the terms that are
actually present are the larger ones (those for small i). Still, some Googling
reveals that it is possible to get the time down to n * log log n.

Cheers,
Simon

This is my code:

public class Primes {

   private boolean[] numbers;
   private long sum = 0;

   public Primes(int size) {
    numbers = new boolean[size];
   }

   public void eraseComposite() {
    int prime = 2;
    while (prime < numbers.length) {
       sum += prime;
       // mark multiples of prime
       for (int i = 2 * prime; i < numbers.length; i += prime) {
               // (we could start with prime*prime here, but I didn't bother
               //  since it would be larger than Integer.MAX_INTEGER)
        numbers[i] = true;
       }
       // move forward to next prime
       prime++;
       while (prime < numbers.length-1 && (numbers[prime])) {
        prime++;
       }
    }
   }

   public static void main(String[] argv) {
    long t = System.currentTimeMillis();
    Primes p = new Primes(1000000);
    p.eraseComposite();
    System.out.println("Time : " + (System.currentTimeMillis() - t) + "ms");
    System.out.println("Sum  : " + p.sum);
   }
}
Patricia Shanahan - 08 Jan 2007 13:34 GMT
> Patricia Shanahan schrieb:
>> As a matter of curiosity, did you use a boolean[] or BitSet?
[quoted text clipped - 6 lines]
> slows it down on my machine. I am using a Laptop, so CPU speed may vary. Today
> it is even faster than yesterday. The output was (code see below):

The most likely cause of the speed up I saw is more efficient caching,
because BitSet stores the data more densely. That is a very hardware
dependent effect, though less a matter of CPU speed than of cache
configuration.

>     public void eraseComposite() {
>     int prime = 2;
[quoted text clipped - 13 lines]
>     }
>     }

There is one optimization you missed. When prime is greater than the
square root of numbers.length, there is no need to do the composite scan.

If a number less than 1,000,000 is composite, at least one prime factor
is less than 1000.

Patricia
Simon - 08 Jan 2007 14:05 GMT
Patricia Shanahan schrieb:
>> Patricia Shanahan schrieb:
>>> As a matter of curiosity, did you use a boolean[] or BitSet?
[quoted text clipped - 38 lines]
> If a number less than 1,000,000 is composite, at least one prime factor
> is less than 1000.

This is what I tried to express in the above comment. It was more laziness than
the reason stated above though that prevented me from implementing it :-)

Cheers,
Simon
Patricia Shanahan - 08 Jan 2007 14:25 GMT
> Patricia Shanahan schrieb:
>>> Patricia Shanahan schrieb:
[quoted text clipped - 43 lines]
> Cheers,
> Simon

Ah, I see. You were thinking of checking by asking whether the square of
the latest prime is less than numbers.length.

The way I did it, there is no possibility of overflow - I stored
Math.sqrt(numbers.length) as an integer. Inside the loop, I just
compared to the square root.

Patricia
John Ersatznom - 08 Jan 2007 21:34 GMT
So to make a long story short, the algorithm is:

* Make array of booleans of length max_num - 1, whose zeroth element
corresponds to 2 and whose max_num - 2nd element corresponds to maxnum.
Clear it (all false). (Java clears it for you if you use Java.)
* Take square root of maxnum and store it somewhere.
* for i = 2 to that square root
*   if (array[i - 2]) continue;
*   for j = i*i to max_num step i
*     set array[j - 2]
*   end
* end

Note the optimization of starting the inner loop at i*i. All the
composites smaller than i*i have a prime factor smaller than i and so
are already marked at that point. Note also skipping the inner loop for
composite values of i.

In fact, using the square root of max_num is a less significant
optimization after the i*i optimization, as it just stops the outer loop
testing every remaining value for compositeness and then entering the
inner loop for the primes but without the inner loop actually doing
anything (as i*i >= max_num the loop terminates immediately).

The algorithm does sqrt(max_num) bit tests (using the sqrt(max_num) opt)
and sum{primes p up to sqrt(max_num}(max_num/p). The number of primes is
O(log sqrt(max_num)) from what I've seen of primes, which is O(log
max_num) since the log of the square root is simply half the log and the
half disappears into the constant factor. The average prime from this
set will be O(1) (integral of log x is log x - x, approximates sum of
primes; divide by log x for 1 - 1/log(x) is O(1)). So the number of bit
sets is O(max_num log max_num) -- O(max_num) per prime as it's max_num/p
with average p O(1), and O(log max_num) primes.

So the whole thing has complexity O(n log n) -- as good as quicksort,
and without the quadratic worst-case behavior. :)

Removing the squaring/square rooting optimizations doesn't increase the
complexity but does increase the running time by a factor of 2 or so.
Note that square roots are expensive, but the only square root occurs
outside any loops...i*i is just a multiplication, and multiplication of
integers is cheap on any modern architecture. The inner loop does one
imul, three iadds (one's actually the subtract in the loop test and one
computes the array index), one icmp (the rest of the loop test), and one
boolean assignment. I expect the compiler to optimize "set array[j - 2]"
to "*(foo + j) = true" or something equivalent -- i.e. to use a point
two before the actual start of the array as the base for the pointer
adds in this loop. Of course, using a BitSet instead of a boolean[] may
change this a lot. A sensible implementation plus jit compiler will
probably reduce it to a similar add and a bit mask operation, but I
expect an idiv will sneak in there to get the array index into the
underlying array, which may make BitSet worse-performing, and the
subtraction of 2 definitely has to be done separately. Likely *(foo + (j
- 2)/64) |= 2^(j - 2) under the hood with 2^(j - 2) sucked out of a
lookup table, for a grand total of two extra iadds (j - 2 and the lookup
into the 2^n table), one extra idiv, and two extra fetches (the lookup
into the 2^n table and retrieving the stored word to |= with something)
versus storing in a boolean[] which is probably the one add and a flat
store, without bothering to retrieve whatever was in the array cell
originally.

I'd be *very* surprised if BitSet actually comes out faster after you
run it a few hundred times to get it all JITted and then a few hundred
more to measure and average.
Richard F.L.R.Snashall - 08 Jan 2007 21:54 GMT
> So to make a long story short, the algorithm is:
>
[quoted text clipped - 13 lines]
> are already marked at that point. Note also skipping the inner loop for
> composite values of i.

What if you used your array to find "open" values at i and above
and considered only those multiples of i?
Patricia Shanahan - 08 Jan 2007 23:13 GMT
...
> I'd be *very* surprised if BitSet actually comes out faster after you
> run it a few hundred times to get it all JITted and then a few hundred
> more to measure and average.

The potential gain from BitSet is more a matter of hardware than
anything JIT can affect.

Especially in the early stages, the sieve has a very cache-unfavorable
memory access pattern, relatively small stride access to a range of a
million elements.

If you use a boolean[], those million elements occupy a million bytes of
memory, because, at least in the implementations I've examined, each
boolean in an array gets its own byte.

BitSet uses a long[], and packs 64 bits into each element, so a million
elements only take 125,000 bytes of memory.

That may make a significant difference if at least one cache in the
system has a size in the range of tens to hundreds of megabytes.

Patricia
Chris Uppal - 09 Jan 2007 19:09 GMT
> If you use a boolean[], those million elements occupy a million bytes of
> memory, because, at least in the implementations I've examined, each
[quoted text clipped - 5 lines]
> That may make a significant difference if at least one cache in the
> system has a size in the range of tens to hundreds of megabytes.

I think you must have meant that last word to be kilobytes ?

   -- chris
Patricia Shanahan - 15 Jan 2007 15:52 GMT
>> If you use a boolean[], those million elements occupy a million bytes of
>> memory, because, at least in the implementations I've examined, each
[quoted text clipped - 7 lines]
>
> I think you must have meant that last word to be kilobytes ?

Thanks for the correction.

Patricia
Simon - 09 Jan 2007 08:14 GMT
> The algorithm does sqrt(max_num) bit tests (using the sqrt(max_num) opt)
> and sum{primes p up to sqrt(max_num}(max_num/p). The number of primes is
[quoted text clipped - 8 lines]
> So the whole thing has complexity O(n log n) -- as good as quicksort,
> and without the quadratic worst-case behavior. :)

I'm not sure what you are saying. We have already seen two arguments for an
upper bound of O(n*log n). Is that a third argument? In particular, I don't
understand what you mean by "The average prime from this set will be O(1)".
(Every set of (distinct) positive integers (prime or not) with at least k
elements has an average of at least k/4 since at least k/2 of its elements are
>= k/2.). Furthermore, the integral of log x is not lox x - x, but rather x *
log x - x. Then, you seem to have this number p in the denominator (in
max_num/p) and use p=O(1) to obtain an upper bound on the term. This, however,
is impossible. You would actually need it to be p=Omega(1) (which is obviously
true in contrast to the O(1)). Also, you cannot just take the average and then
multiply with that as if every element would equal the average.

Or are you trying to give a lower bound? This should also be impossible:
 http://wwwhomes.uni-bielefeld.de/achim/prime_sieve.html
claims that the truth is O(n * log log n) which I tend to believe as I already
posted (There are only log n terms in the harmonic series).

Cheers,
Simon
John Ersatznom - 15 Jan 2007 14:52 GMT
> I'm not sure what you are saying.

Then you didn't read it and understand it. Please try again, or give up,
but there's really no need to attack.
Lew - 15 Jan 2007 15:15 GMT
Simon wrote:
>> I'm not sure what you are saying.

> Then you didn't read it and understand it. Please try again, or give up,
> but there's really no need to attack.

I saw no attack in Simon's post. Did I miss something?

Simon raised some interesting mathematically-motivated points. I am in no
position yet to judge the relative merit of the arguments, but I was awaiting
with some interest your response to Simon, on the assumption that you were
going to continue the mathematical reasoning as you clarified the points about
which Simon asked.

I continue to hope for such information.

- Lew
John Ersatznom - 16 Jan 2007 16:02 GMT
>> Then you didn't read it and understand it. Please try again, or give
>> up, but there's really no need to attack.
>
> I saw no attack in Simon's post. Did I miss something?

Yes; the fact that he was accusing me of being bad at math.

> Simon raised some interesting mathematically-motivated points.

Well, to start with, he basically accused Wikipedia of having the
integral of ln(x) wrong.
He also correctly stated that an average of a set exceeds 1/4 the size
of the set, but thereby was trying to imply that the optimization sucked
because something was therefore O(N) in the total number range instead
of O(log N). The set was of primes, and the size of that is O(log N), so
1/4 of that is O(log N). He actually ends up proving the weak result
that the size of the average prime less than N is *at least* O(log N)
(not ruling out O(N) or whatever); I had gotten the stronger result that
the size of the average prime *was* only O(log N).

The reasoning in a nutshell:

Algorithm to build sieve up to N as I stated it has three parts. Loop
until sqrt(N) (O(sqrt(N)). For each p reached that's not gotten marked
composite earlier, loop from p^2 until N step p. The number of these
loops is O(P), P the number of primes up to sqrt(N). The length of a
given one is O(N/p). Turns out P is O(log(sqrt(N)) = O(1/2 log(N)) =
O(log(N)), so this loop is O(log(N)*X) where the average length of a
loop is O(X). What is the average length of a loop?

It's O(N/p) of course, for the average choice of p. Primes are
logarithmically distributed (P is O(log(N)) above) so the average p is
going to be small fairly independently of the range. It's hard to be
sure it's O(1) though. The average is the sum divided by the total
number. The sum of the primes, the way they are logarithmically
distributed, could be approximated by the integral of the logarithm. (It
might actually be better now I think of it to integrate N/p -- N*(1/2 +
1/3 + 1/5...) looks like N times an exponential decay function, whose
integral is itself and which is clearly O(1))

I concluded that the average length of the loop is bounded by O(N), and
the whole algorithm by O(N log N). If it's actually N log log N or can
be made such, so much the better.
Simon - 16 Jan 2007 17:24 GMT
>>> Then you didn't read it and understand it. Please try again, or give
>>> up, but there's really no need to attack.
>>
>> I saw no attack in Simon's post. Did I miss something?
>
> Yes; the fact that he was accusing me of being bad at math.

I wasn't accusing you, I was analysing your arguments.

>> Simon raised some interesting mathematically-motivated points.
>
> Well, to start with, he basically accused Wikipedia of having the
> integral of ln(x) wrong.

- You said: "integral of log x is log x - x"
- Wikipedia correctly says
(http://en.wikipedia.org/wiki/List_of_integrals_of_logarithmic_functions)
 "int ln(cx) dx  = x * ln (cx) - x"
For our case c=1, this is the same as I said ("Furthermore, the integral of log
x is not lox x - x, but rather x * log x - x. )

> He also correctly stated that an average of a set exceeds 1/4 the size
> of the set, but thereby was trying to imply that the optimization sucked

I didn't say that anything sucked. What optimisation? Starting with the square?
We all know that this does not suck. It is standard.

> because something was therefore O(N) in the total number range instead
> of O(log N).

I was saying "something was O(N)"? What is something? Do you mean Omega(N)?

> The set was of primes, and the size of that is O(log N), so
> 1/4 of that is O(log N).

The average over any set of positive integers of cardinality k is at least
Omega(k). You said: "The average prime from this set will be O(1)". This is
obviously wrong unless the set contains only O(1) elements.

> He actually ends up proving the weak result
> that the size of the average prime less than N is *at least* O(log N)
> (not ruling out O(N) or whatever);

That wasn't a weaker result, it was a counter argument to your bogus claim.
Furthermore, something cannot be *at least* O(log N). It can only be *at least*
Omega(log N). You should really learn the terms.

> I had gotten the stronger result that
> the size of the average prime *was* only O(log N).

You are saying the average prime below N is O(log N)?
There are O(log N) primes below N. At least one of these primes is at least N/2.
Even if we ignore the contribution of all other elements in this set, this
single prime alone implies an average of at least Omega(n/log n).

I have the impression that you are trying to look at the set {1,...,N} and take
the average only over the primes in this set, whatever that means. If that is
what you are trying to do, you really should write that down more precisely.
Still, it would be unclear what this would contribute to your reasoning.

> The reasoning in a nutshell:
>
[quoted text clipped - 7 lines]
>
> It's O(N/p) of course, for the average choice of p.

Note: You cannot just argue over averages and then multiply with the number of
iterations. In general, such arguments fail.

> Primes are
> logarithmically distributed (P is O(log(N)) above) so the average p is
[quoted text clipped - 5 lines]
> 1/3 + 1/5...) looks like N times an exponential decay function, whose
> integral is itself and which is clearly O(1))

Clearly?

1/2 + 1/3 + 1/5 + 1/7 + 1/11 ...

is O(1)? You should read:
http://mathworld.wolfram.com/HarmonicSeriesofPrimes.html

It approaches ln ln N.

> I concluded that the average length of the loop is bounded by O(N),

You need this long argument to conclude that the average length of a loop that
starts above 1, has a step size of more than 1 and ends at N is bounded by O(N)?
Wow.

> and
> the whole algorithm by O(N log N). If it's actually N log log N or can
> be made such, so much the better.

It's a bit hard to understand why you need so much reasoning for this simple
bound for which already two independent arguments (each of which fit into two
lines) have been given.

BTW: Now that we know that the harmonic series of primes grows as O(ln ln N),
this proves that the total time is O(N * log log N).

Cheers,
Simon
Simon - 17 Jan 2007 07:32 GMT
>> I had gotten the stronger result that
>> the size of the average prime *was* only O(log N).
[quoted text clipped - 3 lines]
> Even if we ignore the contribution of all other elements in this set, this
> single prime alone implies an average of at least Omega(n/log n).

Forget this argument, it is nonsense :-)
This one is better:

There are Omega(N/log N) primes between N/2 and N each of which has size at
least N/2. Furthermore, there are O(N/log N) primes below N/2, so the average is
at least

c1 * N/2 * N / log N
------------------- = N * c1/(2*c2)
c2 * N/log N

for suitable constants c1 and c2.

Cheers,
Simon
John Ersatznom - 18 Jan 2007 22:26 GMT
[snip most]
> That wasn't a weaker result, it was a counter argument to your bogus claim.

I'm not going to argue with you if all you're going to do is sit here
and insult me.

[Remainder of condescending BS snipped]

Good day.
Simon - 15 Jan 2007 16:51 GMT
John Ersatznom schrieb:
>> I'm not sure what you are saying.
>
> Then you didn't read it

Yes, I did.

> and understand it.

No, I didn't. That's what I'm trying to say.

> Please try again, or give up,
> but there's really no need to attack.

I apologise for anything which you might have interpreted as an attack, which it
wasn't intended to be. I was trying to be precise as to which arguments of yours
I think are questionable. If you think I got you wrong, please explain. If you
think that my arguments are wrong, please try to be precise also, so I can clarify.

Cheers,
Simon
John Ersatznom - 16 Jan 2007 16:05 GMT
>>and understand it.
>
> No, I didn't. That's what I'm trying to say.

Then you either try again or give up -- what more is there for me to
say? You might also examine my other response to this thread today. In
any event, there is no call for going around publicly suggesting that
anyone else's arguments are "questionable" or similarly just because you
didn't happen to understand them.

Curiously, my earlier post to this thread (that you'd responded to) is
gone. My newsserver claims a 15 day retention and it's only a week old,
which means somebody forged a cancel. It wasn't you, I hope. (If it was,
it's ironic that you quoted much of it in your reply.)
Simon - 16 Jan 2007 17:27 GMT
> Curiously, my earlier post to this thread (that you'd responded to) is
> gone. My newsserver claims a 15 day retention and it's only a week old,
> which means somebody forged a cancel. It wasn't you, I hope. (If it was,
> it's ironic that you quoted much of it in your reply.)

You are suspect me of cancelling your article? Now it's getting weird. Why
should I do that? It is still available on my newsserver, as it is on Google groups.

Cheers,
Simon
Lew - 16 Jan 2007 22:40 GMT
>>> and understand it.

Simon wrote:
>> No, I didn't. That's what I'm trying to say.

> Then you either try again or give up -- what more is there for me to
> say? You might also examine my other response to this thread today. In
> any event, there is no call for going around publicly suggesting that
> anyone else's arguments are "questionable" or similarly just because you
> didn't happen to understand them.

Isn't it fair to say that academic discourse is only possible when at least
one party calls another's reasoning into question? Otherwise it's merely
lecturing. If you aren't ready to have your ideas questioned, it is not yet
time to reveal them.

As to understanding, Simon's posts indicate at least a firm general
understanding of your comments, so lack of expertise seems not to be an issue.
His specific points may be wrong, and he did have specific questions, but his
arguments are not so easily dismissed by a blanket condemnation of his maths
chops. You should examine his claims on a peer level; the ad hominem approach
will not be effective.

I am here to say that both of your arguments are questionable. But I'm only
saying that because so far I don't happen to understand them completely.

- Lew
John Ersatznom - 18 Jan 2007 22:27 GMT
> As to understanding, Simon's posts indicate at least a firm general
> understanding of your comments, so lack of expertise seems not to be an
> issue. His specific points may be wrong, and he did have specific
> questions, but his arguments are not so easily dismissed by a blanket
> condemnation of his maths chops...

The simple fact that he's disputing what I said instead of nodding and
moving on is evidence enough that he didn't properly understand what I
wrote.
Lew - 18 Jan 2007 23:18 GMT
>> As to understanding, Simon's posts indicate at least a firm general
>> understanding of your comments, so lack of expertise seems not to be
>> an issue. His specific points may be wrong, and he did have specific
>> questions, but his arguments are not so easily dismissed by a blanket
>> condemnation of his maths chops...

> The simple fact that he's disputing what I said instead of nodding and
> moving on is evidence enough that he didn't properly understand what I
> wrote.

Of course it is.

- Lew
Jean-Francois Briere - 08 Jan 2007 23:14 GMT
The sum should be 37550402023 and not 37551402022
(Your're adding 999999 by error).
Your code should read:

            // move forward to next prime
           prime++;
           while (prime < numbers.length && (numbers[prime])) {
               prime++;
           }

Regards
Simon - 09 Jan 2007 08:16 GMT
> The sum should be 37550402023 and not 37551402022
> (Your're adding 999999 by error).
[quoted text clipped - 7 lines]
>
> Regards

Oops. Thanks for the correction. I think the "-1" was a relict from a version
where I tested numbes[prime+1] or something. Actually this loop is pretty weird.
It should be a do-loop instead.

Cheers,
Simon
Chris Uppal - 16 Jan 2007 21:28 GMT
[Since this thread seems to have come alive again...]

> Apart from that, from your other post I saw that you have used a different
> argument to obtain the n log n upper bound.
>
> - Your argument: There are only log n primes below n and each iteration
> takes time at most n.

I can't find the post where that argument is made, but I think it's wrong;
there are O(n/log(n)) primes up to n which would give an estimate of O(n^2 /
log(n))

> - My argument: Consider the iteration in which we erase prime i. We erase
> n/i composite numbers. In total, we need time
[quoted text clipped - 5 lines]
> (Here, H(n) is the n-th harmonic number.) Can we combine both arguments to
> obtain an upper bound below n * log n?

You can weight the terms in the sum by the probability that each i is prime.
That (as I understand it) is approximated by 1/log(i).

Also you can reduce the upper bound on i to sqrt(n), which only affects the
constant factors in your formula, but might not "vanish" in the same way with
the weighting applied.

> Still, some Googling reveals that it is possible to get the time down to
> n * log log n.

Sadly, I have no idea whether the weighted sum agument yeilds that formula...

   -- chris
Simon - 17 Jan 2007 07:13 GMT
Chris Uppal schrieb:
> [Since this thread seems to have come alive again...]
>
[quoted text clipped - 7 lines]
> there are O(n/log(n)) primes up to n which would give an estimate of O(n^2 /
> log(n))

You are right, I misread Patricias post. She wrote: "I was not allowing for only
doing the inner loop for primes." so I asked myself how this can help to get the
bound down from O(n^2) to O(n log n). That would work if the number of primes
was O(log n), so I prematurely thought that might be Patricias argument. But it
is, of course, nonsense. Sorry.

Cheers,
Simon
Patricia Shanahan - 09 Jan 2007 14:39 GMT
> Luc The Perverse schrieb:
>> I was messing around on a site where you answer questions for points. (No
[quoted text clipped - 4 lines]
>>
>> Mine didn't - it was less than 10 minutes though and I got the right answer.
...
>> I specifically tried to avoid ever dividing or modding (except in my status
>> reporter, but that only checks when a prime has been found)
>
> I doubt that such micro-optimisations are of much use here.

This, I think, is the big message from this thread. Simply implementing
a very well-known algorithm changed the performance from somewhere
between 1 minute and 10 minutes to a fraction of a second. Tweaking the
code got a further factor of about 3.

It is very, very unlikely that any amount of micro-optimization applied
to the original algorithm could have got the performance of a simple,
direct sieve implementation.

Patricia
Lew - 10 Jan 2007 00:15 GMT
> This, I think, is the big message from this thread. Simply implementing
> a very well-known algorithm changed the performance from somewhere
[quoted text clipped - 4 lines]
> to the original algorithm could have got the performance of a simple,
> direct sieve implementation.

Some years back I tried to make this point to my project manager on a contract
I was working. The optimization there was of a database structure, where the
difference for one report was between O(n^2) and O(n). I was forbidden to make
the algorithmic optimization, then got in trouble because the report was too
slow once they loaded all the data. It took months and lawyers to straighten
that one out.

I don't know why so many project managers seem to ignore this kind of
reasoning. Most are not as egregious, but the problem remains. Not only do we
as engineers need to know about the big O, but we have a fiduciary
responsibility to communicate such reasoning to our customers (including
employers).

- Lew
Luc The Perverse - 10 Jan 2007 04:25 GMT
>> This, I think, is the big message from this thread. Simply implementing
>> a very well-known algorithm changed the performance from somewhere
[quoted text clipped - 11 lines]
> trouble because the report was too slow once they loaded all the data. It
> took months and lawyers to straighten that one out.

ROFL - that is hillarious!

There is a sublime sense of irony when stupid ideas blow up in leaders
faces.  It is usually clouded over by the problem itself though and/or them
yelling at you.

> I don't know why so many project managers seem to ignore this kind of
> reasoning. Most are not as egregious, but the problem remains. Not only do
> we as engineers need to know about the big O, but we have a fiduciary
> responsibility to communicate such reasoning to our customers (including
> employers).

I had a unique method for dealing with this - I would generate a test case
which would cause unreasonably bad results and "demonstrate" it.   Of
course, I would have the entire test case suite available, and a proposed
fix with a sample demonstration and presentation.  Of course, this may not
be as practical for optimizations which take a great deal of time . . or
other environments but it worked for me.

It was my reasoning that if I could not come up with a test case which
caused undesirable results then I didn't need a workaround.
--
LTP

:)
Philipp - 10 Jan 2007 07:54 GMT
>> This, I think, is the big message from this thread. Simply implementing
>> a very well-known algorithm changed the performance from somewhere
[quoted text clipped - 17 lines]
> responsibility to communicate such reasoning to our customers (including
> employers).

Big O is just how the system scales, not the absolute running time. If
you have an amount of entries which will not vary by a factor of 10 in a
close future, it would be wise to check if the O(n^2) is not much faster
than the O(n) algorithm on these entries...
Patricia Shanahan - 15 Jan 2007 16:04 GMT
>>> This, I think, is the big message from this thread. Simply implementing
>>> a very well-known algorithm changed the performance from somewhere
[quoted text clipped - 22 lines]
> close future, it would be wise to check if the O(n^2) is not much faster
> than the O(n) algorithm on these entries...

I did that once, and created a lot of trouble. It was a system for
managing and applying patch cards when booting an operating system. I
was told that there would never be more than order tens of cards at a
time, before the OS would be recompiled with source code changes.

I did a very simple implementation that I knew was O(n^2), but that I
had tested for a couple of hundred cards, several times the number
required by the specification, and at that size it ran about as fast as
the card reader.

Over the years, the number of cards increased to about a thousand, and
it became a real bottleneck. I tried to explain the issue, and get
permission to rework the code to use e.g. a tree structure or a hash
table instead of a linear search...

A lot depends on your confidence that reimplementation will be permitted
if the problem outgrows the O(n^n) solution.

Patricia
Lew - 15 Jan 2007 16:23 GMT
> I did that once, and created a lot of trouble. It was a system for
> managing and applying patch cards when booting an operating system. I
[quoted text clipped - 13 lines]
> A lot depends on your confidence that reimplementation will be permitted
> if the problem outgrows the O(n^n) solution.

Over the last thirty or so years, I have seen many projects where we were told
"we can always refactor it later", but later were told, "we are not going to
refactor given the investment in the existing codebase, besides it works,
doesn't it?"

As a practical matter it is almost impossible to get management buy-in to
throw away existing code no matter how terrible. One worst case scenario was a
nine-month project that had cost my employer nearly a million dollars to a
cosultant, and still didn't work - it gave wrong results and crashed the host.
(The consultant was known in our trenches to be a scammer, but we never could
convince management of that.) I stepped in, took two days and about 150 lines
of code (including blank lines and comments) and created a complete,
bulletproof solution. (Not actually a difficult project, you see - the
consultant was a scammer.) I nearly got in trouble, since I had completely
disregarded the nine months / million dollars worth of existing code that
didn't work. As it was, I was the focus of a lot of managerial concern. (They
always looked so sad when I told them, "Port Steve's code? I never even looked
at Steve's code; I just rewrote the application from the specification.") The
investors pulled out all their money from the company two weeks later
(possibly for unrelated reasons). Welcome to the future.

There has to be an awful lot of pain before there is buy-in for refactoring,
and often not even then.

- Lew
John Ersatznom - 16 Jan 2007 16:11 GMT
>> A lot depends on your confidence that reimplementation will be permitted
>> if the problem outgrows the O(n^n) solution.
[quoted text clipped - 13 lines]
> There has to be an awful lot of pain before there is buy-in for
> refactoring, and often not even then.

Wash your hands of it. If management says "we can always refactor it
later" get that in writing or on tape. If later it won't scale, point
out the earlier discussion (for which you have documentation). Also note
that they did invest money in the existing code, and they then got
however-many years of use out of it before their needs outgrew that
code. Replacing it isn't throwing away the investment, it's making a new
investment after an earlier one has paid off about all it's going to.

If they still don't see the light, wash your hands of it. Say "OK" and
do whatever. If the company goes kaput, gets scammed and goes kaput, or
whatever, well, you did your duty of warning them and they did their
duty of actually making the call. They got it wrong but not for lack of
information; it's not your fault. If you have the skills, you can find
another job and maybe one where your expertise will be better
appreciated. As for the company that goes under, well, that's Darwinism
at work. If the management can't find new work after it does, well,
that's probably also Darwinism at work. ;)
Chris Uppal - 10 Jan 2007 14:34 GMT
> The optimization there was of a database
> structure, where the difference for one report was between O(n^2) and
> O(n). I was forbidden to make the algorithmic optimization, then got in
> trouble because the report was too slow once they loaded all the data. It
> took months and lawyers to straighten that one out.

I think part of the problem with this sort of situation is that we don't have a
commonly understood way of adding contingent sub-plans to projects.   It seems
quite plausible to me that your manager might have correctly decided that the
risk and time involved in the optimisation was not acceptable at that time.
But ideally you and s/he should have been able to add a contingent extension to
the project plan, somewhat like (briefly)

   Risk:
       The XYZ report may run too slowly with production-scale datasets.
   Reason:
       O(N^2) algorithm used (unecessarily).
   Responsibility for monitoring and reviewing risk:
       <some person, group, or organisation> at <some time>.
   Suggested fix:
       <details>
   Probable time:
       <N weeks> + testing.
       NB, retesting required /at system integration scope/.

The "Responsibility for monitoring and reviewing risk:" bit would be, to my
mind, the single most important bit of this in that the system couldn't be
deemed complete unless the issue had been resolved.

At least, /I/ have never seen anything like this used in project planning (not
even informally).

   -- chris
Lew - 11 Jan 2007 06:00 GMT
Lew wrote:

>> The optimization there was of a database
>> structure, where the difference for one report was between O(n^2) and
[quoted text clipped - 6 lines]
> quite plausible to me that your manager might have correctly decided that the
> risk and time involved in the optimisation was not acceptable at that time.

The fix involved half a day's work and was routine - no risk.

> But ideally you and s/he should have been able to add a contingent extension to
> the project plan, somewhat like (briefly)
>
>     Risk:
>         The XYZ report may run too slowly with production-scale datasets.

There was no doubt that it would, given the projected data sizes.

>     Reason:
>         O(N^2) algorithm used (unecessarily).
[quoted text clipped - 5 lines]
>         <N weeks> + testing.
>         NB, retesting required /at system integration scope/.

No such reasoned approach was forthcoming from my project manager then. He
simply didn't want to hear it. The difference in that case was not weeks but a
few hours of development time to implement the fix.

> The "Responsibility for monitoring and reviewing risk:" bit would be, to my
> mind, the single most important bit of this in that the system couldn't be
> deemed complete unless the issue had been resolved.

I had a fix ready to implement. I made sure my replacement knew of it. When he
in turn was finally permitted to make the change, it took half a day and
completely fixed the problem. That didn't happen until the project manager who
had forbidden the change was replaced. By the guy I'd trained with the fix.

> At least, /I/ have never seen anything like this used in project planning (not
> even informally).

Not sure of the antecedent. If you mean you've never seen someone irrationally
and arbitrarily refuse to let a product work when it would have been easy to
repair, lucky you. If you mean you've never seen anyone plan for such a fix,
using reasoned analysis and with fallback positions, unlucky you. I have seen
both ends of that spectrum.

- Lew
Chris Uppal - 11 Jan 2007 20:41 GMT
[me:]
> > I think part of the problem with this sort of situation is that we
> > don't have a commonly understood way of adding contingent sub-plans to
[quoted text clipped - 3 lines]
>
> The fix involved half a day's work and was routine - no risk.

/Nothing/ involves no risk.  But yes, given your estimate, the manager's
position does sound a little, err, irrational...

> > But ideally you and s/he should have been able to add a contingent
> > extension to the project plan, somewhat like (briefly)
...
> > At least, /I/ have never seen anything like this used in project
> > planning (not even informally).
>
> Not sure of the antecedent. If you mean you've never seen someone
> irrationally and arbitrarily refuse to let a product work when it would
> have been easy to repair, lucky you.

Oddly, enough, I don't think I've ever been in the position where "management"
hasn't heeded my warnings on specific technical issues ("heeded" != "agreed
with after due consideration").  I /have/ been in the position of saying
something like "but this plan only covers the functionality we actually know
about now; there's at least that much again, /really important/ stuff, which we
haven't even started to think about" -- naturally I was roundly criticised for
"being negative"...

> If you mean you've never seen anyone
> plan for such a fix, using reasoned analysis and with fallback positions,
> unlucky you. I have seen both ends of that spectrum.

That's what I meant.  Of course I have seen plans changed to allow for problems
as they arise, but I haven't seen anyone using a planning idiom to postpone
addressing, but still properly manage, potential risks as they are identified.

   -- chris
Luc The Perverse - 11 Jan 2007 21:37 GMT
> Oddly, enough, I don't think I've ever been in the position where
> "management"
[quoted text clipped - 8 lines]
> for
> "being negative"...

In general I would complain about my situation, a programmer being subjected
to menial work - but one benefit does come out of it.

It is very easy to quit your job on principle, if you are making less than
10$ (UD)/ hr

--
LTP

:)
Daniel Pitts - 12 Jan 2007 01:17 GMT
> In general I would complain about my situation, a programmer being subjected
> to menial work - but one benefit does come out of it.
>
> It is very easy to quit your job on principle, if you are making less than
> 10$ (UD)/ hr
Hmm, thats extremely low wage for a programmer.
Actually, thats about what a Starbucks Barista makes in San Fransisco.
You'd be better off working in a coffee shop. At least then you could
get free caffein!
Luc The Perverse - 12 Jan 2007 02:27 GMT
>> In general I would complain about my situation, a programmer being
>> subjected
[quoted text clipped - 7 lines]
> You'd be better off working in a coffee shop. At least then you could
> get free caffein!

They're called internships - what you take when you can't get real jobs . .
. And then intermittent manual labor jobs when you can't find an internship.

Keep in mind that the cost of living is MUCH higher in San Francisco than
here in BFE.

--
LTP

:)
Daniel Pitts - 12 Jan 2007 19:35 GMT
> >> In general I would complain about my situation, a programmer being
> >> subjected
[quoted text clipped - 13 lines]
> Keep in mind that the cost of living is MUCH higher in San Francisco than
> here in BFE.
That is a valid point. Ofcourse, you could get an internship in S.F.
for $20/hr :-)
Patricia Shanahan - 05 Jan 2007 12:01 GMT
> I was messing around on a site where you answer questions for points. (No
> prizes or anything)
[quoted text clipped - 8 lines]
> Is there an algorithm out there, which I will be able to understand, which
> works significantly faster than this one?
...

The usual algorithm is the Sieve of Eratosthenes,
http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes

The sieve is O(n*n), but with small constant factor.

Patricia
Patricia Shanahan - 05 Jan 2007 12:10 GMT
>> I was messing around on a site where you answer questions for points.
>> (No prizes or anything)
[quoted text clipped - 15 lines]
>
> The sieve is O(n*n), but with small constant factor.

I was wrong about the complexity. I was not allowing for only doing the
inner loop for primes.

Patricia
Luc The Perverse - 05 Jan 2007 19:58 GMT
>>> I was messing around on a site where you answer questions for points.
>>> (No prizes or anything)
[quoted text clipped - 18 lines]
> I was wrong about the complexity. I was not allowing for only doing the
> inner loop for primes.

Looks like I had better check this out!

hehe  cool

Thanks guys!  (And I mean Simon too)

--
LTP

:)
Abhi - 08 Jan 2007 09:04 GMT
Hi Luc,
I found the primes<1000000 and stored them in an array.also summed them
up.I am pasting my code.Pls hav a look..........it took 1782 msecs on a
p4 2.4GHz processor && 512 mb ram.
If this code is not ok then let me know:)

/*
* Created on Jan 7, 2007
*
* To change the template for this generated file go to
* Window&gt;Preferences&gt;Java&gt;Code Generation&gt;Code and
Comments
*/
package src;

//import java.lang.Math;
/**
* @author Jboss
*calculate sum of prime nos less than 1 million
*/
public final class Sum_Prime {

    private static int currNo;
    private static int currSqrt;
    private static int[] primeArr;
    private static long currSum = 0;
    private static final int range = 1000000; // calc till 1000000;
    private static long startTime;

    public static void main(String[] args) {
        //Math.sqrt()
        int currArrSize = range / 20;//holds the curr size of the array
        if (currArrSize == 0)
            {
                primeArr = new int[10];
                currArrSize=10;
            }
        else
            primeArr = new int[currArrSize];

        startTime = System.currentTimeMillis();
        int k = 0; //index of the array which stores the prime nos.

        currNo = 2; //start looking from 2
        System.out.print("\n arr size \t"+primeArr.length);
        for (int i = 2; i <= range; i++) {
            boolean isPrime = true;
            currSqrt = (int) Math.sqrt(currNo);
            for (int j = 2;
                j <= currSqrt;
                j++) //chk for 3 whose sqrt is 1.5....
                {
                if (currNo % j == 0) {

                    isPrime = false;
                    break;
                }
            }
            if (isPrime) {
                try {
                    primeArr[k] = currNo;
                    currSum = currSum + currNo;
                    k++;
                } catch (ArrayIndexOutOfBoundsException arrex) {
                    currArrSize=currArrSize*2;
                    int[] tempArr = new int[currArrSize];
                    System.arraycopy(primeArr, 0, tempArr, 0,k);//copies the old arr
into new 1
                    primeArr=tempArr;
                    //System.out.print("\n primeArr size \t"+primeArr.length);
                    //System.out.print("\n tempArr size \t"+tempArr.length);
                    primeArr[k]=currNo;
                    currSum = currSum + currNo;
                    System.out.print("\n new arr size \t"+primeArr.length);
                    k++;
                }

            }
            currNo++;
        }
        System.out.print("\n sum \t" + currSum);
        System.out.print("\n curr val \t" + currNo);
        System.out.print("\n nos found \t" + k);
        System.out.print(
            "\n Time taken to execute in millisecs \t"
                + (System.currentTimeMillis() - startTime));
        System.exit(0);
    }
}
tell me the name of the site. where u saw this..........

> I was messing around on a site where you answer questions for points. (No
> prizes or anything)
[quoted text clipped - 71 lines]
>
> :)


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.