>>TreeSet maintains its elements in their natural order, hence iterating
>>will produce "Apple Banana Cricket " instead of "Apple Cricket Banana"
[quoted text clipped - 7 lines]
> comparable, which offers natural ordering. In case of strings, the
> natural ordering implies alphabetical sorting. [...]
Almost. The natural ordering for String objects is their
lexicographic order according to the Unicode values of their
individual characters, so (for example) "Aïda" comes after
"Axolotl". For an even more blatant violation of alphabetical
order, note that "Zebra" precedes "aardvark".
Also, "alphabetical order" varies from place to place, even
if you consider only languages written in Latin alphabets. As
far as I know (I'm no expert on this, just someone who once got
a bit of a scolding from a person who was), everybody agrees on
the ordering of the twenty-six unaccented letters, but the
treatment of accented letters is subject to national and linguistic
variation.
See also java.text.Collator and allied classes.

Signature
Eric Sosman
esosman@acm-dot-org.invalid
Chris Uppal - 03 Nov 2006 19:04 GMT
> Almost. The natural ordering for String objects is their
> lexicographic order according to the Unicode values of their
> individual characters, so (for example) "Aïda" comes after
> "Axolotl". For an even more blatant violation of alphabetical
> order, note that "Zebra" precedes "aardvark".
All true, and undoubtedly all that the OP needs to know.
But it may be (midly) interesting to note that -- although String /claim/ to
sort lexicographically according to the Unicode code points -- they /actually/
sort by the UTF-16 values. And that doesn't produce the same sort order for
Unicode characters outside the 16-bit range.
> Also, "alphabetical order" varies from place to place, even
> if you consider only languages written in Latin alphabets. As
> far as I know (I'm no expert on this, just someone who once got
> a bit of a scolding from a person who was), everybody agrees on
> the ordering of the twenty-six unaccented letters,
For interest: there are exceptions. Or at least, there are according to the
Unicode people. Spanish (they say) traditionally considers ll to be a digraph
falling between l and m.
-- chris
Tom Forsmo - 03 Nov 2006 21:59 GMT
> Almost. The natural ordering for String objects is their
> lexicographic order according to the Unicode values of their
> individual characters, so (for example) "Aïda" comes after
> "Axolotl". For an even more blatant violation of alphabetical
> order, note that "Zebra" precedes "aardvark".
I have allways though of natural ordering as the way we humans would
order strings, e.g.
string1
string2
string...
string9
string10
string11
..
string20
string21
and other similar sort order, instead of the typical
string1
string10
string11
...
string2
string20
string21
etc and other similar types of sorting problems
Is there a name for the "human sort order"? and does there exists an
implementation of it. Perhaps a name could be "semantic sort order".
I realise that it would require quite a big implementation with many
special case rules that in no way can be generalised and made into a
single algorithm like lexical or natural orders.
tom
Oliver Wong - 03 Nov 2006 22:11 GMT
> I have allways though of natural ordering as the way we humans would order
> strings, e.g.
[quoted text clipped - 14 lines]
> special case rules that in no way can be generalised and made into a
> single algorithm like lexical or natural orders.
There may be a name for the particular sorting you've shown above
(though the sort order is ambiguous, because there are some corner cases you
haven't demonstrated in that example), but I doubt that there exists a
"human sort order", as different humans are likely to sort things in
different orders.
See Chris' post about how a Spanish speaking person would likely sort
strings in different order than an English speaking person, for example.
- Oliver
Eric Sosman - 03 Nov 2006 23:49 GMT
Tom Forsmo wrote On 11/03/06 15:59,:
>> Almost. The natural ordering for String objects is their
>>lexicographic order according to the Unicode values of their
[quoted text clipped - 32 lines]
> special case rules that in no way can be generalised and made into a
> single algorithm like lexical or natural orders.
You might be interested in
http://sourcefrog.net/projects/natsort/
(Yes, that's "frog" as in "Kermit.")

Signature
Eric.Sosman@sun.com