Java Forum / General / July 2007
Lists of size Integer.MAX_VALUE
Lew - 07 Jul 2007 19:08 GMT What would happen from a call like
someList.addAll( Integer.MAX_VALUE, someCollection );
where someList contains at least Integer.MAX_VALUE elements and someCollection is not empty?
According to <http://java.sun.com/javase/6/docs/api/java/util/List.html#add(int,%20E)> void add(int index, E element)
> Inserts the specified element at the specified position in this list (optional operation). and throws
> IndexOutOfBoundsException - if the index is out of range (index < 0 || index > size()) Suppose we have a List 'someList' containing exactly Integer.MAX_VALUE elements and we someList.add( element ); or someList.add( Integer.MAX_VALUE, element );
According to the Javadocs, this is legal, but how then could we reference the last element of the list?
(Presumably we're facing an OOME before these are testable problems in today's JVMs.)
 Signature Lew
Arne Vajhøj - 07 Jul 2007 19:24 GMT > What would happen from a call like > > someList.addAll( Integer.MAX_VALUE, someCollection ); > > where someList contains at least Integer.MAX_VALUE elements and > someCollection is not empty?
> Suppose we have a List 'someList' containing exactly Integer.MAX_VALUE > elements and we [quoted text clipped - 4 lines] > According to the Javadocs, this is legal, but how then could we > reference the last element of the list? I think the docs are wrong.
If the list is backed by an array, then it will fail.
Arne
Daniel Pitts - 08 Jul 2007 00:25 GMT > What would happen from a call like > [quoted text clipped - 25 lines] > -- > Lew On a 64 bit machine with > 2gigs of memory, you can test this to see what happens. I suspect that the original implementors thought "2 billion item arrays are enough for anyone" :-)
Roedy Green - 08 Jul 2007 17:38 GMT >(Presumably we're facing an OOME before these are testable problems in today's >JVMs.) Since corner cases are usually where programs fail, it is wise to back up by one or two from a corner. I don't have a a 64-bit JVM to test this, but I would guess you will get some other exception such as an ArrayIndexOutOfBoundsException. -- Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Lew - 08 Jul 2007 17:49 GMT >> (Presumably we're facing an OOME before these are testable problems in today's >> JVMs.) [quoted text clipped - 3 lines] > this, but I would guess you will get some other exception such as an > ArrayIndexOutOfBoundsException. I have but 1 GB RAM and I don't think this motherboard accepts more than 2 GB, so I am currently unable to test this, too. I figure similarly to you and Arne, who said:
> I think the docs are wrong. > > If the list is backed by an array, then it will fail. The point here, as with the bug in quicksort that went undetected for about forty years [1], is that even the designers missed the corner cases.
[1] <http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html>
 Signature Lew
Eric Sosman - 08 Jul 2007 20:20 GMT > [...] > The point here, as with the bug in quicksort that went undetected for > about forty years [1], is that even the designers missed the corner cases. > > [1] > <http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html> An interesting reference, but one which does not contain the word "quicksort" nor even "quick" ...
 Signature Eric Sosman esosman@ieee-dot-org.invalid
Lew - 08 Jul 2007 21:38 GMT >> [...] >> The point here, as with the bug in quicksort that went undetected for [quoted text clipped - 6 lines] > An interesting reference, but one which does not contain > the word "quicksort" nor even "quick" ... Abashed - I meant binary search. (And two decades, not four.) I got my buzzwords crossed.
It still makes the point about corner cases, though, and that large memory requirements are not so unlikely.
 Signature Lew
Twisted - 08 Jul 2007 22:06 GMT > >> [...] > >> The point here, as with the bug in quicksort that went undetected for [quoted text clipped - 9 lines] > Abashed - I meant binary search. (And two decades, not four.) I got my > buzzwords crossed. Every binary divide and conquer algorithm is likely to be affected in a majority of implementations. Quicksort included. The net effect being to restrict utility to half the nominal capacity, e.g. 2^30 rather than 2^31 items in a typical signed-32-bit-integer-based implementation, and often without clean failure behavior if that is exceeded (except in Java, where an exception should be thrown, which is fairly clean). C and C++ programmers of course are long since used to unclean failure behavior. If your code is hardened against walking off the ends of arrays or corrupting memory in the event of integer overflow or other corner cases or simply bad inputs, you can bet at least one of the libraries you're using (possibly including the standard C runtime library) isn't, even after you avoid known hazards like strcpy in favor of e.g. strncpy.
Eric Sosman - 08 Jul 2007 22:18 GMT >>> [...] >>> The point here, as with the bug in quicksort that went undetected for [quoted text clipped - 12 lines] > It still makes the point about corner cases, though, and that large > memory requirements are not so unlikely. The defect Bloch's note describes is not an algorithmic defect, but a defect in the translation of the algorithm into terms of a concrete implementation: the algorithm requires the computation of a midpoint index, and the implementation assumes this can be done by summing the two extreme indices and dividing by two. Unfortunately, the assumption fails when the indices are large enough to make their sum overflow.
Curiously, this particular error is one a C programmer would not have made, or at least would have been less likely to make. The C programmer would probably have been dealing not with array indices but with pointers, and it wouldn't occur to him to try to form the sum of two pointers (and if it did, the compiler would scold him). Instead, the C programmer would have found the distance between the two endpoints, divided the distance by two, and added/subtracted that half-step to/from the low/high endpoint:
middle = low + (high - low) / 2;
(The boundary conditions would probably be a little different from those in Java, too, so let's not fret about a plus-or-minus one.) There's a faint chance the `high - low' computation could overflow, but only if searching a table of more than 2G single-byte values -- and if the programmer wastes 2GB on such a table, he deserves all the bad things that happen!
Languages foment (and maybe ferment) their own bug idioms.
 Signature Eric Sosman esosman@ieee-dot-org.invalid
Twisted - 08 Jul 2007 21:08 GMT > >> (Presumably we're facing an OOME before these are testable problems in today's > >> JVMs.) [quoted text clipped - 14 lines] > The point here, as with the bug in quicksort that went undetected for about > forty years [1], is that even the designers missed the corner cases. Integer overflow is the worst, for some reason; probably because it violates the usual mathematical intuitions about arithmetic operations. One of those expensive space disasters -- rocket blowing up or going in the wrong direction, Mars probe or satellite failing, or some such -- was eventually traced to a 16-bit integer overflow bug in some microcontroller code running on an ancient 286-era microchip in the guts of the thing.
As for Java collections, they're pretty much all backed by arrays aren't they? I think we may want to eventually propose HugeFoo versions of these collections. There'd be HugeArray, with long indices and under the hood an array of arrays indexed by the high 32 and low 32 bits, or something like that. Or perhaps the TreeArray, conceptually an array but actually a deeper tree, which grows in depth for every factor of 2^31 in the array length, and uses BigIntegers. These are a somewhat slower-performing normal array that doesn't take up much extra space for sizes below 2^31. Your more sophisticated collection classes, such as HugeArrayList, are straight ports of their non-Huge cousins to using a HugeArray or TreeArray and using long or BigInteger parameters.
With TreeArray and BigIntegers, memory capacity is the ONLY limit, unless BigIntegers have some limitation under the hood. It's likely they're based on arrays of ints, so limited to (2^32)^(2^31) possible values, or 2^(2^36). A HugeInteger class, whose implementation is backed by a TreeArray class implemented to use HugeInteger indices*, seems warranted ... :)
* Figuring out how to avoid infinite loops in recursion is left as an exercise to the reader, but a hint: the HugeInteger should store the lowest-order 31 bits in a loose int rather than the TreeArray.
Joshua Cranmer - 08 Jul 2007 21:30 GMT > As for Java collections, they're pretty much all backed by arrays aren't > they? LinkedList is not, but ArrayList is.
All sets map to respective maps, and I believe that TreeMap does not use arrays whereas {Linked}HashMap (I believe) is backed by an array.
PriorityQueue is probably array-backed, ConcurrentLinkedQueue not, SynchronousQueue doesn't even count (you can't even have one element), and DelayQueue I have no idea.
That sums up to about half of all real implementations of Collections, which makes sense (one random access, one not for each type), and is a far cry from "pretty much all".
Twisted - 08 Jul 2007 21:59 GMT [apparently hostile post largely trimmed]
> PriorityQueue is probably array-backed, ConcurrentLinkedQueue not, > SynchronousQueue doesn't even count (you can't even have one element), > and DelayQueue I have no idea. PriorityQueue is actually probably backed by a disjoint set forest, and so by trees (plural).
In any event I stand by what I originally said; the most frequently used concrete collections are HashFoo and ArrayList, which are all array-backed, with infrequent use of the queues and LinkedList, and use of TreeFoo downright rare in my experience.
Patricia Shanahan - 08 Jul 2007 21:35 GMT ...
> As for Java collections, they're pretty much all backed by arrays > aren't they? I think we may want to eventually propose HugeFoo [quoted text clipped - 8 lines] > non-Huge cousins to using a HugeArray or TreeArray and using long or > BigInteger parameters. ...
I saw similar workarounds going from 16 bits to 24 bits, and from 24 bits to 32 bits. Also going from 32 bits to 64 bits for file sizes and offsets.
Each time, the final conclusion was that we should be able to have a single array as large as the memory can support and a single file as large as the disks can support. I was really hoping we could skip the intermediate steps this time, and go straight where we are going to end up anyway.
Patricia
Tom Hawtin - 08 Jul 2007 21:38 GMT > Integer overflow is the worst, for some reason; probably because it > violates the usual mathematical intuitions about arithmetic [quoted text clipped - 3 lines] > in some microcontroller code running on an ancient 286-era microchip > in the guts of the thing. You're thinking of Logica's(?) software in Ariane-5, the first launch of which in 1996 veered off course during ascent. The software used Ariane-4 as the base, and wasn't sufficiently checked for assumptions.
> As for Java collections, they're pretty much all backed by arrays > aren't they? I think we may want to eventually propose HugeFoo > versions of these collections. There'd be HugeArray, with long indices > and under the hood an array of arrays indexed by the high 32 and low > 32 bits, or something like that. You would just add a 64-bit array to the JRE with intrinsics. NIO buffers already use longs for addresses under the covers. I think there is an RFE for 64-bit indexes arrays though (personally, I don't think it's necessary to place it in the language).
The trouble is that the List *interface* insists on using ints.
> Or perhaps the TreeArray, > conceptually an array but actually a deeper tree, which grows in depth [quoted text clipped - 4 lines] > non-Huge cousins to using a HugeArray or TreeArray and using long or > BigInteger parameters. There are lots of approaches to do that sort of thing. IIRC, the relevant RFE evaluations suggests there is PhD material in that sort of thing.
> With TreeArray and BigIntegers, memory capacity is the ONLY limit, > unless BigIntegers have some limitation under the hood. It's likely > they're based on arrays of ints, so limited to (2^32)^(2^31) possible > values, or 2^(2^36). A HugeInteger class, whose implementation is > backed by a TreeArray class implemented to use HugeInteger indices*, > seems warranted ... :) Hmm. I think I'd prefer to see a MediumInteger first., BigInteger isn't very efficient dealing with integers of only a few hundred bits.
Tom Hawtin
Twisted - 08 Jul 2007 22:01 GMT > Hmm. I think I'd prefer to see a MediumInteger first., BigInteger isn't > very efficient dealing with integers of only a few hundred bits. What's needed is a TrueInteger with no range limits and the strategy pattern used to supply appropriate implementations depending on size.
Roedy Green - 08 Jul 2007 22:02 GMT ><http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html> The code Bloch gives to get around the overflow problem fails in other cases. I explore the various flaws of his three algorithms at
http://mindprod.com/jgloss/average.html -- Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Googmeister - 09 Jul 2007 15:17 GMT On Jul 8, 5:02 pm, Roedy Green <see_webs...@mindprod.com.invalid> wrote:
> ><http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about...> > > The code Bloch gives to get around the overflow problem fails in other > cases. I explore the various flaws of his three algorithms at > > http://mindprod.com/jgloss/average.html Right, but these cases aren't applicable to Bloch's updated binary search code. The original (lo + hi) / 2 code was definitely a flaw, but I don't think it's fair to say that his updated ones are flawed since they work exactly as claimed.
Roedy Green - 09 Jul 2007 19:48 GMT On Mon, 09 Jul 2007 14:17:02 -0000, Googmeister <googmeister@gmail.com> wrote, quoted or indirectly quoted someone who said :
>Right, but these cases aren't applicable to Bloch's updated >binary search code. The original (lo + hi) / 2 code was >definitely a flaw, but I don't think it's fair to say that >his updated ones are flawed since they work exactly as claimed. It is flawed in that the algorithms do not do what is advertised "set m to the average of l and u, truncated down to the nearest integer". It is not flawed because in the context of a binary search, the bugs don't hurt it. -- Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Tom Hawtin - 08 Jul 2007 18:58 GMT > What would happen from a call like > > someList.addAll( Integer.MAX_VALUE, someCollection ); > > where someList contains at least Integer.MAX_VALUE elements and > someCollection is not empty? It should work correctly.
ArrayList will actually throw ArrayIndexOutOfBounds. That's a subtype of IllegalArgumentException, which I guess makes some kind of sense.
I guess there are probably bugs if you, say, addAll a collection of more than Integer.MAX_VALUE elements to an ArrayList.
Integer types with limited ranges suck.
At least it isn't a problem for ArrayList until you have one taking over 16 GB.
Tom Hawtin
Piotr Kobzda - 09 Jul 2007 13:12 GMT > Suppose we have a List 'someList' containing exactly Integer.MAX_VALUE > elements and we [quoted text clipped - 4 lines] > According to the Javadocs, this is legal, but how then could we > reference the last element of the list? someList.listIterator(Integer.MAX_VALUE).next();
piotr
Piotr Kobzda - 09 Jul 2007 15:57 GMT >> Suppose we have a List 'someList' containing exactly Integer.MAX_VALUE >> elements and we [quoted text clipped - 6 lines] > > someList.listIterator(Integer.MAX_VALUE).next(); Of course, it requires the implementation of ListIterator which don't fully support its interface (in particular previousIndex()/nextIndex() methods cannot be fully supported).
Safer approach (in terms of achieving working solution) would be:
lastElement = null; for(Iterator it = someList.iterator(); it.hasNext(); lastElement = it.next());
But it's poor from the performance point of view...
In fact, the same performance problem applies to the ListLterator's use (a list must be a bit bigger only to observe that, for example 2*Integer.MAX_VALUE, or more, elements...).
So, currently defined List interface disallows the efficient working with a "large lists" IMHO.
Some additional methods are needed to support that, for example a list implementing the following interface would be better:
public interface LargeList<T> extends List<T> { /** Returns a view of the portion of this list between the specified fromIndex, inclusive, and this list's last element, inclusive. */ List<T> subList(int fromIndex); }
Unfortunately, most of the current uses of lists assumes that a maximum list size is Integer.MAX_VALUE, so the above-like large list won't work as expected with other libraries (e.g. Collections.reverse() is useless with such a list...).
Conclusion: large lists are impractical.
piotr
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|