Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / May 2005

Tip: Looking for answers? Try searching our database.

Creating a unique random id

Thread view: 
Fritz Bayer - 25 May 2005 11:01 GMT
Hello,

how likely is it that the following code produces duplicate ie
identical random numbers, when it gets executed on different virtual
machines running on different computers?

import java.util.Random;

public class RandomNumberGenerator
{
 Random rand = new Random();

 public RandomNumberGenerator()
 {
   super();
 }

 public int get(int x)
 {
   return (int) rand.nextInt(x) + 1;
 }

 public String getRandomSequence(int length)
 {
   StringBuffer stringBuffer = new StringBuffer();
   for (int i = 0; i < length; i++)
     stringBuffer.append(get(10));
   return stringBuffer.toString();
 }
}

public final static void main()
{
 RandomNumberGenerator randomNumberGenerator = new
RandomNumberGenerator();
 System.println("How likely is it to get this number again: " +
randomNumberGenerator(20));
}

Here is a test case at least for one computer it does not seem to
return the same numbers:

public class TestRandomNumberGenerator extends TestCase
{

 public void testUniqueness()
 {
   String last ="";
   String current;
   for (int i = 0; i < 10000000; i++)
   {
     RandomNumberGenerator randomNumberGenerator = new
RandomNumberGenerator();
     current = randomNumberGenerator.getRandomSequence(20);
     assertNotSame(current, last);
     //System.out.println(current + " " + last);
     last = current;
   }
 }

 public static Test suite()
 {
   return new TestSuite(TestRandomNumberGenerator.class);
 }
}
Andrew Thompson - 25 May 2005 11:15 GMT
> how likely is it that the following code produces duplicate ie
> identical random numbers,

What are the odds you can produce a self contained piece of code
that compiles and displays the problem you discuss?
<http://www.physci.org/codes/sscce.jsp>

>..when it gets executed on different virtual
> machines running on different computers?
[quoted text clipped - 9 lines]
>     super();
>   }

But a question.  Given your class extends Object, and gets a default 'no
args' constructor, what is the point of the last four lines shown above?

Signature

Andrew Thompson
http://www.PhySci.org/codes/  Web & IT Help
http://www.PhySci.org/  Open-source software suite
http://www.1point1C.org/  Science & Technology
http://www.LensEscapes.com/  Images that escape the mundane

JScoobyCed - 25 May 2005 12:08 GMT
> Hello,
>
[quoted text clipped - 17 lines]
>     }
>   }

In the piece of code above, you test only successive values, ie number
[N] with [N-1] (having N>0). You should test the [N] value with all /i/
previous values, ie [N-1], [N-2], ..., [N-i].
A simple way to code it is to put each value in a HashTable:

 public void testUniqueness() {
  HashTable ht = new HashTable();
  String current;
  for (int i = 0; i < 10000000; i++) {
   RandomNumberGenerator randomNumberGenerator;
   randomNumberGenerator = new RandomNumberGenerator();
   current = randomNumberGenerator.getRandomSequence(20);
   // Fails if 'current' is a duplicate
   assertNull(ht.put(current));
  }
 }

This would make sure you don't have duplicate.

--
JSC
Ingo R. Homann - 25 May 2005 12:47 GMT
Hi,

> how likely is it that the following code produces duplicate ie
> identical random numbers, when it gets executed on different virtual
[quoted text clipped - 4 lines]
>     return (int) rand.nextInt(x) + 1;
>   }

I guess, the bets are 1:2.147.483.648! :-)

Ciao,
Ingo
Jacob - 25 May 2005 13:06 GMT
> how likely is it that the following code produces duplicate ie
> identical random numbers, when it gets executed on different virtual
> machines running on different computers?

I use the currentTimeMillis() in order to create unique IDs.
It requires that the ID generator is centralized and it
needs to ensure at least one millisecond between each
created ID.
Murray - 25 May 2005 14:10 GMT
> > how likely is it that the following code produces duplicate ie
> > identical random numbers, when it gets executed on different virtual
[quoted text clipped - 4 lines]
> needs to ensure at least one millisecond between each
> created ID.

Actually it's more that 1ms. Most operating systems / hardware have a time
accuracy of significantly more than 1ms
Bob - 25 May 2005 14:10 GMT
> Hello,
>
> how likely is it that the following code produces duplicate ie
> identical random numbers, when it gets executed on different virtual
> machines running on different computers?

In my opinion, it's irrelevant how likely it is to occur. The fact that
it can occur means that your solution is not sound.

Why not just create an object that can be accessed by all the machines,
which has a static method that just increments a static counter and
hands that next number out? That way, you're guaranteed to have 2^63 - 1
unique identities before you run out.
Signature

Bob

Chris Uppal - 25 May 2005 15:12 GMT
> how likely is it that the following code produces duplicate ie
> identical random numbers, when it gets executed on different virtual
> machines running on different computers?

There are two aspects to this question.

The first is theoretical.  /Assuming/ that you are generating 20-digit strings
completely at random, you can calculate the probability of finding duplicates.
If the number of possible values of some random value is N, then if you have a
pool of sqrt(N) values, the chances are that one value occurs twice (Google for
the birthday paradox).  In this case you have 1e20 possible random strings, so
you would expect to have to create 1e10 (10 billion) strings on average before
you got a duplicate.

However, that's on the assumption that the strings are completely random.
Which happens not to be true in your case.  For one thing, java.util.Random is
a pseudo-random generator, and may have statistical properties that make
duplicates more likely (I'm not mathematician enough to know).  More
importantly it has the designed-in quality that each instance of Random always
produces the /same/ sequence of numbers given that it has started with a
specific seed.  As a result each instance of your RandomNumberGenerators will
also produce the same sequence of numbers if the underlying Random has the same
seed.  So the number of possible /sequences/ of 20-digit number is determined
by
the number of possible seeds to Random (the space of possible numbers that they
can produce is larger, because each generator can produce more than one
number -- it may even be as large as 1e20, though I doubt it).  In fact, Random
only keeps 48 bits of its seed so the most you can have is 2**48 different
RandomNumberGenerators.  That in turn means that if you create 2**24 instances
of RandomNumberGenerator then you'd expect to find two that were identical, and
therefore generated identical sequences of numbers.  2**24 is about 17 million,
which is quite a lot (depending on the application, of course), but remember
that that's an /upper/ bound.  The actual situation may be worse.

So, in practise the chances of finding duplicate 20-digit numbers is dominated
by the chances of two machines using the same seed to initialise a
RandomNumberGenerator.

In Sun's Java 1.4.2 the seed used by default (if you don't provide one
yourself) is System.currentTimeMillis().  That is declared to answer a 64-bit
long, but there won't really be 64 bits-worth of possible values because it is
answering an absolute time, and that is likely to be similar on all the
machines.

If two machines create new RandomNumberGenerators at a steady (on average) rate
of one per second, and if their system clocks are synchronised closely enough
that they read the same time on each machine at some point in the run (not
necessarily the same point), then during the periods where they overlap, the
chances are 1 in 1000 of producing an identical RandomNumberGenerator at /each/
step.

Say that you have 10 machines, all synchronised to within 1 minute, and you run
the generators for an hour on all of them (creating a new instance once a
second as before).  For 59 out of the 60 minutes, each new
RandomNumberGenerator will be created with a millisecond seed that is not more
than 1 second different from that of one of the others created on each of the
other 9 machines.  So you will create 59 * 60 "batches" of 10
RandomNumberGenerators all with seeds that can differ by no more than 1000.
The chances of all 10 from one batch being different are 1.000 * 0.999 * 0.998
* ... * 0.991 = 0.956.  So each batch has e a 4.4% chance of containing
duplicate RandomNumberGenerators.  Since you create 59*60=3480 such batches in
the hour, the chances of none of them containing duplicates is 0.956**3480.
Which I make to be about 6e-69.  I.e. it is essentially certain that you'll
create two identical RandomNumberGenerators which will then naturally produce
identical sequences of 20-digit numbers.

Of course, there's also the possibility that two /different/
RandomNumberGenerators will produce the same 20-digit number at some point, but
the chances are very much lower, as described above.

In Sun's Java 1.5.0 Random uses System.nanoTime() as its default seed; that
isn't an absolute time, so it is more "random".  However there is no guarantee
that it can actually answer all 2**64 possible values.  If, say, it only
answered 32-bits-worth of real information, then the number of possible random
sequences would be cut down to 2**32.

And there will probably be a tendency for machines to have reset their
nanosecond counters at roughly the same time (within the same month, say) so
there's a higher than chance probability that two machines will answer the same
number at some points in their lives.  You can work out the probability in the
same kind of way as I did above.

Repeating the calculation.  First off there is less chance that any two
machine's nanosecond clocks will overlap during the hour since they are not
synchronised, but instead reflect when the machine was last reset (or something
like that).  For the sake of argument assume that they have all been reset in
the last month.  Secondly there is much less chance that any two machines with
nanosecond clocks that differ by < 1 second will answer the same nanosecond
time. If we assume that the nanosecond timer actually has microsecond
resolution [*], then the chances are 1 in a million -- which is a lot smaller
than 1 in 1000.

([*] I don't know how likely this is, but this machine's 'high performance
counter' claims to click 3.6 million times a second so it may be plausible.)

With those assumptions, for each Random created (once per second per machine)
there is a 1 / (31 * 24) chance that it is created during the a "second" (as
seen by the nano clocks) that is covered by any specific one of the other
machines during that run.  So there is approximately 9 / (31*24) chance of it
overlapping with any of the other 9 machines. That's about 1.2%.  If it /does/
overlap, then there's a 1e-6 chance of it actually seeing the /same/ nanosecond
value. That means that each Random has about 1.2e-8 chance of being identical
to some other, or to put it another way, about (1-1.2e-8) chance of being
unique (i.e. very near certainty)  During the run we produce 10*60*60 instances
of Random, so the chances of all of them being unique are about
(1-1.2e-8)**36000.  That's about 0.99956, or to put it the other way around
there's about one chance in 2300 of finding a duplicate Random anywhere.

For comparison, if we consider 36000 genuinely random 20-digit numbers, then I
make the chances of there being a duplicate about 1 in 178 billion.

So it all comes down to the seed for Random.  I'd say that duplicates might be
quite likely on 1.4 machines. On 1.5 machines they are probably quite unlikely,
providing that the nanosecond counter has a respectably large range of possible
values (as it probably does on Windows machines, I don't know about any
others).

   -- chris
Wibble - 26 May 2005 02:38 GMT
How about

http://java.sun.com/j2se/1.5.0/docs/api/java/rmi/server/UID.html

Prefix it with a host id and away you go.

Wasn't Fritz just looking for consecutive duplicates anyway?
Most pseudo random number generators wont repeat for a
fairly large, known period.

I dont think java gaurantees any particular random implementation, so
its usually better to choose your own.  There's a pretty big tradeoff
between randomness and performance.

>>how likely is it that the following code produces duplicate ie
>>identical random numbers, when it gets executed on different virtual
[quoted text clipped - 114 lines]
>
>     -- chris
Filip Larsen - 26 May 2005 07:54 GMT
> Here is a test case at least for one computer it does not seem to
> return the same numbers:
[quoted text clipped - 22 lines]
>   }
> }

Your code is testing that current and last refer to same object, not
that they are unequal which is probably what you want.

If you change the code to test for equality instead you will now
probably find that your generator produces a lot of equal numbers in
every clock tick because you are instantiating a new Random every time
you generate a number and Random uses currentTimeMillis() to initiate
its seed. If you fix this, for instance by making rand a static instance
variable, your code will still generate the same number on two VM's if
you instantiate your generators at the same clock tick. To fix that, you
can use SecureRandom which makes an effort to make a unguessable (but
not globally unique) seed.

However, instead of all that I would recommend that you look at
java.util.UUID if you are on Java 1.5 and java.rmi.server.UID if you are
on a previous version (although the UID class do not really provide the
"universal" part).

Regards,
Signature

Filip Larsen

Fritz Bayer - 26 May 2005 08:07 GMT
> Hello,
>
[quoted text clipped - 61 lines]
>   }
> }

So what's your suggestion for creating unique ids? The processes (http
proxies) can't communicate with each other and are not allowed to
contact a central server for generating the ids.

They run on different machines and get started at a random time by
somebody. The distribution is not uniform, since most people have a
preference when to start their services - for example in the early
hours of the day.

For every user an id should be generated when he contacts the site. So
the time when a user contacts the proxy is truely random (maybe this
helps to solve the problem?). The generated id is then trasmitted and
saved on a server, where all ids are kept. So thats why they should be
unqiue.

Maybe by using the random time combined with the startup time of the
process we can get some more unique random numbers? What is by the way
with the class java.rmi.server.UID? I have read that this creates
unique ids but only for 48 days. Well....
jonck - 26 May 2005 10:46 GMT
> So what's your suggestion for creating unique ids? The processes (http
> proxies) can't communicate with each other and are not allowed to
> contact a central server for generating the ids.

This statement contradicts the following, since in the above statement
you say that your clients don't contact a central server...

> For every user an id should be generated when he contacts the site. So
> the time when a user contacts the proxy is truely random (maybe this
> helps to solve the problem?). The generated id is then trasmitted and
> saved on a server, where all ids are kept. So thats why they should be
> unqiue.

... and here you're saying that they _do_ contact a central server!

You're looking at your problem backwards: you want your clients to
create an id and store these on the server. If you simply turn this
around, let the server create the id's once the clients contact the
server, your problem is solved.
Roland - 26 May 2005 12:32 GMT
>>Hello,
>>
[quoted text clipped - 81 lines]
> with the class java.rmi.server.UID? I have read that this creates
> unique ids but only for 48 days. Well....

So you want to generate a globally unique ID without using some central
server.

Java 5 offers a class for this: java.util.UUID.
<http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUID.html>
This class has a static method to create a pseudo random UUID (there are
other categories of UUIDs too). FWIW, the implementation of this method
(randomUUID()) uses java.security.SecureRandom, which uses more bits for
generation than java.util.Random (48-bit seed).

See also
<http://en.wikipedia.org/wiki/Universally_Unique_Identifier>

Another interesting site I found: <http://www.random.org/> which offers
a "True Random Number Service".
Signature

Regards,

Roland de Ruiter
  ___      ___
 /__/ w_/ /__/
/  \ /_/ /  \

The Wogster - 26 May 2005 14:17 GMT
> So what's your suggestion for creating unique ids? The processes (http
> proxies) can't communicate with each other and are not allowed to
[quoted text clipped - 10 lines]
> saved on a server, where all ids are kept. So thats why they should be
> unqiue.

From what I see, your not looking for random, but rather unique values,
and that is much, much easier.

If you have access to a central server -- it's debatable from what you
have written here.  The server can simply dole out from an incremented
counter.  If you do not, then assign each machine a unique ID, this
could be based on a number of things, for example the IP address (if
fixed), user ID number, or typed in on first use.  This could also be a
counter stored on a server, that is contacted by each machine ON FIRST
USE, it gives each machine that contacts it for this, a new machine ID,
which gets stored locally on the machine itself (I would prefer this, if
possible, you know the values are guaranteed to be unique - at least as
long as you have less then 2 147 483 648 machines.

Now on each machine you store an int sized counter, start at 0.

When you need a generated ID then, in a long you copy the machine ID
multiplied by 4 294 967 296 then increment the machines counter and add
the counter value to your long.  This gives you a unique 64bit ID, that
allows a very large number of machines to access it, and at one contact
per second has enough capacity for over 135 years on each machine.  If
your counter ever does reach 2 147 483 647 then simply delete the
machine ID and counter, then obtain a new machine ID and start again.

The reason why we use insane numbers like 4294967296 is that 4294967296
is 2^32, so on 32bit machines it will be faster because your dealing
with full values, rather then partial values.
Fritz Bayer - 27 May 2005 07:04 GMT
> > So what's your suggestion for creating unique ids? The processes (http
> > proxies) can't communicate with each other and are not allowed to
[quoted text clipped - 38 lines]
> is 2^32, so on 32bit machines it will be faster because your dealing
> with full values, rather then partial values.

Hi,

it`s a requirement that the ids do not come from a server but get
stored on one. Sounds stupid but that's the way it is.

I can use the ip address as a seed. And you are right, that I do not
need random numbers but rather unique numbers. Since I don't have jdk
1.5 I can't use UUID unfortunaltey.

The problem with your counter is that I can't make it persistent so at
the next time the program starts up it will be zero again. I need a
methode which based on the IP, the startup time and the time at which
a user contacts the proxy generates a unique id.
Owen Jacobson - 27 May 2005 08:15 GMT
>> > So what's your suggestion for creating unique ids? The processes
>> > (http proxies) can't communicate with each other and are not allowed
>> > to contact a central server for generating the ids.

(snip)

>> > For every user an id should be generated when he contacts the site.
>> > So the time when a user contacts the proxy is truely random (maybe
[quoted text clipped - 16 lines]
> it`s a requirement that the ids do not come from a server but get stored
> on one. Sounds stupid but that's the way it is.

Don't just say "it's a requirement that this be so".  *Especially* for
something so apparently backwards, it's a good idea to explain why the
requirement's there; sometimes, the reasons for the requirement can even
reveal a solution that fits properly.

> I can use the ip address as a seed. And you are right, that I do not need
> random numbers but rather unique numbers. Since I don't have jdk 1.5 I
> can't use UUID unfortunaltey.

For a network connection, the IP address + originating port will be unique
as long as the connection lasts; for a web application this is less useful
because the connection is transient, but it makes a temporarily-unique
starting point, anyways.

> The problem with your counter is that I can't make it persistent so at the
> next time the program starts up it will be zero again. I need a methode
> which based on the IP, the startup time and the time at which a user
> contacts the proxy generates a unique id.

What's the actual purpose behind the randomly generated IDs?  How will
they be used?  This is also important for creating a correct, useful
solution to the whole problem.

Owen
Chris Uppal - 27 May 2005 10:51 GMT
> I need a
> methode which based on the IP, the startup time and the time at which
> a user contacts the proxy generates a unique id.

The IP address is not suitable for this purpose -- the problem is that many
client machines will be on NAT-ed subnets and will likely all be using IP
addresses from a small pool of "private use" addresses.  The basic idea is
sound, but what you need to be using is the MAC address from the network
interface.  I'm afraid I have no idea whether that is directly accessible via
pure Java.

If you use that, the millisecond timestamp (or nano if it's available), and the
date, plus perhaps any other bit of "random" data that happens to be available
(the user's name ?), then you should be able to brew up an ID that is unique
/provided/ that you don't have technically sophisticated users attempting to
break the system by (e.g.) messing with the system clock.

Note that including all that info in a UID has fairly serious privacy
implications (may even be illegal in some places for all I know).  You'd be
better off pushing the whole block of data through something like SHA1 or some
similar (or better) crypto-quality one-way hash.  Of course, doing that would
re-introduce the possibility of clashes, but (unless the users are malevolent)
the odds are very, very, low.

One thought, could you create UIDs on the client, and use that to set up the
initial "handshake" but then supply a higher quality UID (created on the
server)
and hand that back to the client ?  If so then, since the client-generated UIDs
would be short-lived (a few seconds, minutes, hours, or days, depending on your
app) the problem of clashes would presumably be eliminated (unless the users
are malevolent).

   -- chris
Lucy - 27 May 2005 16:07 GMT
<big snip>
> it`s a requirement that the ids do not come from a server but get
> stored on one. Sounds stupid but that's the way it is.
[quoted text clipped - 7 lines]
> methode which based on the IP, the startup time and the time at which
> a user contacts the proxy generates a unique id.

First you say that this number is stored on the server, then you say you
can't make it persistent - make up your mind.
This solution assumes that there are only two machines, but you can
make it work for more. All you need to know is which machine you are
running on.

Say you have two machines, A and B. Assign A to be the ODD one
and B the EVEN one. Then your code just generates a random
number using any technique you like. If you are on A and the number
is even, just keep generating numbers till you get an odd one and
use that. Similar for B. If you can store numbers between runs
you can look at the last value to see if you are A or if you are B.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.