Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / December 2007

Tip: Looking for answers? Try searching our database.

Arrays.sort() slow?

Thread view: 
vaste@vaste.mine.nu - 05 Dec 2007 21:39 GMT
Why does java use to quicksort to sort byte arrays? Wouldn't it be
much faster to just count the elements? (Almost count-sort.)
Similarily for shorts, although not always obviously faster (requires
65k*4).

E.g. one could do:
   public static void sort(byte[] a) {
       int[] reg = new int[256];
       for(byte b : a)
           reg[b+128]++;
       int i = 0;
       for(int k = 0; k < 256; k++)
           for(int j = 0; j < reg[k]; j++)
               a[i++] = (byte)(k-128);
   }

I tried some, and it is indeed faster. Then again, I suppose it's
quite rare that one need to "sort" byte arrays without sattelite data.
Lew - 05 Dec 2007 22:32 GMT
> Why does java [sic] use to quicksort to sort byte arrays? Wouldn't it be
> much faster to just count the elements? (Almost count-sort.)
[quoted text clipped - 14 lines]
> I tried some, and it is indeed faster. Then again, I suppose it's
> quite rare that one need to "sort" byte arrays without sattelite data.

Sweet algorithm.

Sorts with deep knowledge of their target type can take advantage of that
special knowledge to outstrip a general sort's performance.  Some general
sorts perform acceptably for most input if the pre-sort order is random, but
perform horridly if the input is already ordered.  If the likelihood of
ordered input is low, this will not cause trouble.  If most or all input is
ordered, then a sort that "knows" about ordered input will be more efficient.

Sorts, searches and merges are huge topics to this day.

As to why the writers of one or another API choose a particular algorithm, in
this case possibly to achieve acceptable performance in the time domain
without too much impact in the memory domain while keeping things simple in
the development and maintenance domains.  Just a WAG; I wasn't there.

Signature

Lew

Daniel Pitts - 05 Dec 2007 22:38 GMT
> Why does java use to quicksort to sort byte arrays? Wouldn't it be
> much faster to just count the elements? (Almost count-sort.)
[quoted text clipped - 14 lines]
> I tried some, and it is indeed faster. Then again, I suppose it's
> quite rare that one need to "sort" byte arrays without sattelite data.

Is that faster in this case:

sort(new byte[]{255, 52, 1});

At what point does it become faster?  How about with shorts? Should we
use int[65536] for shorts?  If you have a huge list of bytes that you
need to sort, feel free to use the implementation you provided.
Otherwise, don't worry so much about it.

Signature

Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Christian - 05 Dec 2007 22:55 GMT
vaste@vaste.mine.nu schrieb:
> Why does java use to quicksort to sort byte arrays? Wouldn't it be
> much faster to just count the elements? (Almost count-sort.)
[quoted text clipped - 14 lines]
> I tried some, and it is indeed faster. Then again, I suppose it's
> quite rare that one need to "sort" byte arrays without sattelite data.

If your keys have numerical data a bin-sort is also possible ... and may
perform well with large arrays.
Roedy Green - 06 Dec 2007 09:45 GMT
On Wed, 5 Dec 2007 13:39:34 -0800 (PST), "vaste@vaste.mine.nu"
<vaste@vaste.mine.nu> wrote, quoted or indirectly quoted someone who
said :

>Wouldn't it be
>much faster to just count the elements? (Almost count-sort.)
>Similarily for shorts, although not always obviously faster (requires
>65k*4).

I refer to that as as the degenerate case of a radix sort. See
http://mindprod.com/jgloss/radixsort.html

The problem with using it in general is you must provide an additional
method beyond those normally used for a Comparator or Comparable. I
discovered only a few situations where radixSort beats Sun's sort in
general.

It might make sense to use a radixsort for arrays of primitives,
perhaps doing byte, short, char in one pass, and int in two passes and
long in four.  

Disadvantages:

1. it takes more ram for the aux counters.

2. it takes full time even when the data are already sorted or almost
sorted.  Very often only one element is out of position.

Do some benchmarks, and if there is a substantial improvement, put in
an RFE.  See http://mindprod.com/jgloss/rfe.html

I discovered radixsort back in 1968.  It was not until the 1990s and
the Internet did I learn it was already known. Many times I explained
it and Hashtables to all kinds of programmers.  Back then, those two
algorithms they were not widely known.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

vaste@vaste.mine.nu - 12 Dec 2007 02:26 GMT
On Dec 6, 4:45 am, Roedy Green <see_webs...@mindprod.com.invalid>
wrote:
> On Wed, 5 Dec 2007 13:39:34 -0800 (PST), "va...@vaste.mine.nu"
> <va...@vaste.mine.nu> wrote, quoted or indirectly quoted someone who
[quoted text clipped - 33 lines]
> Roedy Green Canadian Mind Products
> The Java Glossaryhttp://mindprod.com

What I described is not stable, so it wouldn't work with satellite
data. I.e. the java api has a method that takes an array of bytes
(without reference to any Object) and sorts them. For this they seem
to use quicksort, even for large arrays. This is obviously slower than
the quite trivial algorithm I gave (even if the list is already
sorted).

I wondered if there was any reason for this. I expected the methods in
the standard library to be somewhat optimized...

For sorting general Objects "linearly" one would indeed need a new
interface like comparable etc, and then e.g. radix sort might be a
good choice, but now we're talking about primitives. (And for ints or
longs, radix sort might also be a good choice.) Then again, radix sort
and count sort require twice the memory sorted, which sounds a bit
scary.

/Johan Fänge
Roedy Green - 12 Dec 2007 16:25 GMT
On Tue, 11 Dec 2007 18:26:22 -0800 (PST), "vaste@vaste.mine.nu"
<vaste@vaste.mine.nu> wrote, quoted or indirectly quoted someone who
said :

>I wondered if there was any reason for this. I expected the methods in
>the standard library to be somewhat optimized...

I think they did the object sort, then thought of the others as mere
convenience methods to the same algorithm.  Sorting arrays of bytes
rare. So likely nobody thought of optimising it.  I repeat.

Present your case with two benchmarks, with code.  Point out the lack
of the usual downsides with a radixsort, and post an RFE. They usually
do ones that are easy.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.