Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / February 2007

Tip: Looking for answers? Try searching our database.

Help me!! Why java is so popular

Thread view: 
amalikarunanayake@gmail.com - 02 Feb 2007 16:33 GMT
java has become an important language in very short time. What are
some of the things that have made it so popular?
Eric Sosman - 02 Feb 2007 17:11 GMT
amalikarunanayake@gmail.com wrote On 02/02/07 11:33,:
> java has become an important language in very short time. What are
> some of the things that have made it so popular?

   One point in Java's favor is that it's a pretty good
teaching language.  It protects the still-inept student from
many kinds of beginners' mistakes, but doesn't coddle him to
the point of cocooning.  You can test this claim for yourself
by taking a Java-based class and doing the homewo--

   Oh.  Wait; I get it.

Signature

Eric.Sosman@sun.com

Alex Hunsley - 02 Feb 2007 17:19 GMT
> java has become an important language in very short time. What are
> some of the things that have made it so popular?

Java is important because lazy/cheating students can take classes in
this language and not have to look far for someone to do their homework
for them.
Jack Kielsmeier - 02 Feb 2007 17:31 GMT
> java has become an important language in very short time. What are
> some of the things that have made it so popular?

One of the biggest reasons JAVA has taken off, is its ease of portability.
When an application is written in a language like C or C++, it must be
compiled separately for each platform that will run your application.

Java has a virtual machine that is able to take JAVA code information and
translate it to the specific machine running the application on the fly. It
makes it so you do not have to re-compile the application for every
platform.

There are some downsides to this; JAVA is generally slower than a language
like C++ because of the extra time needed by the Virtual Machine. With how
fast computers are today, most people do not care about the performance
penalty in their applications (the penalty is usually pretty small now).
Alex Hunsley - 02 Feb 2007 23:58 GMT
>> java has become an important language in very short time. What are
>> some of the things that have made it so popular?
[quoted text clipped - 12 lines]
> fast computers are today, most people do not care about the performance
> penalty in their applications (the penalty is usually pretty small now).

Whoo! That's him got off without having to do his homework!
lex
raddog58c - 06 Feb 2007 21:45 GMT
> <amalikarunanay...@gmail.com> wrote in message
>
[quoted text clipped - 6 lines]
> When an application is written in a language like C or C++, it must be
> compiled separately for each platform that will run your application.

I think a big reason Java took off initially was the fact it's a C-
language derivative without pointers. Most of the Java programmers I
knew 10 years ago were disillusioned C/C++ people who found the memory
management tedious and error prone.  The fact they could instantiate
and code and never clean up after themselves freed them to concentrate
on the problems they really wanted to expend their energy solving.

The reason people would get into Java coding today probably has little
to do with this and is due more to the language's large base of highly
functional classes that plug-n-play easily, so programmers can slam
dunk applications a lot easier.  It's all about what's easy for us
(programmers), eh?

> Java has a virtual machine that is able to take JAVA code information and
> translate it to the specific machine running the application on the fly. It
> makes it so you do not have to re-compile the application for every
> platform.

Certainly true, though the same can be said for Perl and while it's
pretty popular, it's not as popular as Java. Partly that's due to Perl
being totally interpreted versus half-ways compiled (ie, byte code),
and let's face it, Perl isn't really a full-fledged programming
language.

The other thing that's perhaps a side effect from this paradigm is
that Java provides mostly least-common-denominator system services.
For instance, someone was asking me about checking for an already-
running instance of a program on a Windows workstation.  That's a
really easy thing to implement in any language that can talk directly
to the OS -- about 15 lines of code invoking EnumWindows.

You can't do that in Java, however, unless you go JNI.  That's not
necessarily a bad thing, however.  You lose some OS-locale-based fine-
tuned features, but on the flip side when you grab someone's class
libs off a web site you have almost no work to do to use them,
regardless of what OS you're using.  That's a pretty compelling
attraction for agile programming.

> There are some downsides to this; JAVA is generally slower than a language
> like C++ because of the extra time needed by the Virtual Machine. With how
> fast computers are today, most people do not care about the performance
> penalty in their applications (the penalty is usually pretty small now).

I would say "the penalty is usually pretty small now" is very context
sensitive.  Not to pick on you personally, but the general embracing
of the "memory is cheap" or "performance is good enough" is popular in
the Java community, and it's wrong to me.  Maybe that's because I use
everything from MASM to IBM 370 BASM to C, C++, Java, VB and Perl, not
sure...

But I would never, or at least rarely, use Java for desktop utilities
I create for myself because the startup time is dreadfully slow, and
the impact on other applications is large.  If you're running a system
with a GIG or less of RAM, firing up a memory-hungry JVM isn't
something you do hastily.  Run something like Process Explorer (http://
www.microsoft.com/technet/sysinternals/ProcessesAndThreads/
ProcessExplorer.mspx) and look at the responsiveness of the UI.  You
can tune Java apps all day and you simply can't get that without ultra
high-speed hardware.

I can start Java apps on my 3GHz Pentium and the drain on the system
is highly noticable.  I'll have periods where the system is completely
unresponsive to mouse clicks, or the mouse pointer icon lumbers across
the screen.  Shut down those apps and the system returns to the
standard Windows response (which is still barely tolerable, but at
least it's better).  I could probably do some tuning to counteract
that, but that defeats the purpose of writing quick-n-dirty utilities
to rapidly perform some function.

I think of Java programs like driving a Winnebago.  For comfort and
ease of travel it's hard to beat, especially on long trips where you
want lots of room and comfort and not have to deal with meticulous
details like where to set your full coffee cup or worrying about where
to toss your trash -- most of these are taken care of you by the
environment.

For quick trips around town, negotiating road space during rush hour,
or for parallel parking on crowded city streets it's overkill and
difficult to maneuver.

I always have felt the Java language is superlative, but the Java
runtime model is gluttonous, and that two things would greatly help
Java, IMO:

1) Optional override on storing data as unicode: why pay for something
if you don't need it?  Unicode is great if you need multinational
support, but in 27 years of programming I've maybe needed that in two
applications.  Converting data back and forth when I don't need it is
like paying for a cable TV subscription for your home when you are a
traveling salesman who's on the road 7 days a week.  Wasing
computational energy degrades responsiveness of programs running in
the JVM as well as programs running in the system outside of the
JVM.

2) Optional override on garbage collection: If I don't want garbage
collection, or I want to decide when it is going to run, then I want
to control it. Garbage collection makes things "safe," but not all of
us need that safety. I've written OS's from the ground up, so I'm
totally comfortable in my memory mgmt skills. When I'm trying to
streamline processing, spurious threads I didn't start are problematic
and get in the way.  I like having control when I program, and there
are times I don't want GC and it'd be nice to shut if off.

These things degrade the Java footprint, IMO.  Many purist Java-only
programmers will say things like "memory is cheap" or "with the
current speed of CPUs" etc.  But that's a really bad philosophy and a
cop out. It's insensitive to optimization, basically pretending
response time and memory footprints are of little importance.
Memory's cheap if you already bought it and it's installed right now
-- it's neither cheap nor convenient if, for example, you just ran out
in the middle of some long running process, or when the system's so
lethargic that mouse clicks seem to be running across an RS-232 port
at 4800 BAUD.

Software engineers don't have to get bogged down with making every
nanosecond count, but we shouldn't ignore the fact that systems are
never fast enough, never responsive enough, never have enough memory,
never enough disk space, and that the faster and bigger the HW folks
make'm, no matter how fast and big, the faster we bog'm down with OS's
that require multiple DVDs to load and multiple GIGs of memory to
work.

Java is a beautiful language with a rich, expressive syntax and a vast
array of easy-to-use classes in the community.  Those are a couple of
big reasons its popularity has grown. The more SW engineers can do to
shrink the performance gap between Java and natively-compiled
languages like C/C++ and the more we take memory footprints and CPU
cycles seriously, the better off Java will be.

Performance is the biggest drawback to using Java, maybe its only big
drawback, and that's why I'm such a huge opponent against blanket
statements that ignore this side of the Java tradeoff.

Again, this is more of a spew in all directions in an attempt to get
all J-programmers thining, than a direct response to your note.
Didn't want it to come across as a flame, cuz it's not.
Mark Thornton - 06 Feb 2007 22:41 GMT
> For instance, someone was asking me about checking for an already-
> running instance of a program on a Windows workstation.  That's a
> really easy thing to implement in any language that can talk directly
> to the OS -- about 15 lines of code invoking EnumWindows.
If we are talking about arbitrary applications then that only works for
applications that actually have a window.

> You can't do that in Java, however, unless you go JNI.
On the other hand if we are talking about testing for an instance of
your own application (written in Java) then that is possible without
resorting to JNI.

> I would say "the penalty is usually pretty small now" is very context
> sensitive.  Not to pick on you personally, but the general embracing
> of the "memory is cheap" or "performance is good enough" is popular in
> the Java community, and it's wrong to me.  Maybe that's because I use
> everything from MASM to IBM 370 BASM to C, C++, Java, VB and Perl, not
> sure...

Sometimes the penalty can be zero or even negative, particularly for
applications which run long enough to eliminate the startup effects of
JIT compiling.

> But I would never, or at least rarely, use Java for desktop utilities
> I create for myself because the startup time is dreadfully slow, and
> the impact on other applications is large.  If you're running a system
> with a GIG or less of RAM, firing up a memory-hungry JVM isn't
Rubbish. You can do quite a lot of useful work with the JVM 'using' only
8MB or less which is trivial in the context of 1GB memory. I've been
running a service, written in Java, which has no noticeable effect on
the responsiveness of my normal applications. It sits there all day
doing its stuff and I can easily forget that it is still running. My
machine is a 4 year old 3.06GHz Pentium with 1GB of RAM.

> Performance is the biggest drawback to using Java, maybe its only big
> drawback, and that's why I'm such a huge opponent against blanket
> statements that ignore this side of the Java tradeoff.
Look who is making blanket statements. I care a lot about performance,
but I don't have a problem in this respect with Java. Java isn't
perfect. For some applications it can be slower than say C++, but in
other cases it can be just as fast (or even faster).

Mark Thornton
raddog58c - 07 Feb 2007 00:03 GMT
On Feb 6, 4:41 pm, Mark Thornton <mark.p.thorn...@ntl-spam-world.com>
wrote:
> > For instance, someone was asking me about checking for an already-
> > running instance of a program on a Windows workstation.  That's a
[quoted text clipped - 3 lines]
> If we are talking about arbitrary applications then that only works for
> applications that actually have a window.

Sure enuff, but point being the tradeoff between getting to the OS
layer or stopping at the language layer. Java stops you at the
implementation, unless you go JNI -- that prevents some things, but
that prevention faciliates more portability.  It's a tradeoff.

> > You can't do that in Java, however, unless you go JNI.
>
> On the other hand if we are talking about testing for an instance of
> your own application (written in Java) then that is possible without
> resorting to JNI.

Absolutely.  The point in this case was again the tradeoffs.

> > I would say "the penalty is usually pretty small now" is very context
> > sensitive.  Not to pick on you personally, but the general embracing
[quoted text clipped - 6 lines]
> applications which run long enough to eliminate the startup effects of
> JIT compiling.

How can it negative?  I'm not saying you're wrong, but how can any
byte-coded language outperform a binary language if they are doing the
same thing?  It can't, because you have to convert the byte code to
the native binary stream before you can execute it. So I'm thinking
you mean certain algorithms are more efficiently handled by the JVM?
Please elucidate -- I heard someone say Java memory management now
exceeds C and I thought it was an interesting notion and probably
related to some ingenius optimizations in memory mgt algorithms,
though I honestly don't know.

> Rubbish. You can do quite a lot of useful work with the JVM 'using' only
> 8MB or less which is trivial in the context of 1GB memory. I've been
> running a service, written in Java, which has no noticeable effect on
> the responsiveness of my normal applications. It sits there all day
> doing its stuff and I can easily forget that it is still running. My
> machine is a 4 year old 3.06GHz Pentium with 1GB of RAM.

No doubt, and Java's fine messaging implementation and rich set of
protocol support, eg, makes it a good vehicle for such things.  A
service I'm fine with -- a utility I need to fire up over and over,
not so fine with that. I wouldn't write that in Java.

At one time there was a Java compiler that let you go from Java
to .EXE.  I used it quite a bit, although the .EXEs it generated were
pretty fat for the functionalit they implemented.  Then again Java's
not about creating .EXEs, so that didn't surprise me.

Languages have strengths and weaknesses.  When it comes to my tools I
want as close to subsecond response time as possible, so I'm looking
for .EXE based apps.  If I'm parsing huge chunks of random text, I'm
all for Perl.  If it's XML or a large messaging paradigm, Java's
great.

> Look who is making blanket statements. I care a lot about performance,
> but I don't have a problem in this respect with Java. Java isn't
> perfect. For some applications it can be slower than say C++, but in
> other cases it can be just as fast (or even faster).

What would be an example where it's faster?  I write a lot of Java
these days, as well as a lot of C++ and even some C and ASM.  I know a
decent bit about Java best practices, but I'm not in Java 7x24 -- so
if there's a way to make my Java scream, I'd like to know.  At best
I'm looking at avoiding fat objects with features I don't need, such
as synchronized collections like Vectors where a single thread is
using the object.

How does one get Java to run faster than a compiled language?  It
would seem there has to be a catch or specific circumstances for that
to occur, because it's hard to fathom how that could be the case.

Thanks for the comments, by the way.
Chris Uppal - 07 Feb 2007 17:12 GMT
> > Sometimes the penalty can be zero or even negative, particularly for
> > applications which run long enough to eliminate the startup effects of
[quoted text clipped - 4 lines]
> same thing?  It can't, because you have to convert the byte code to
> the native binary stream before you can execute it.

At least in theory, the JVM's JITer has more information available to it than a
compiler producing a statically pre-compiled binary would have.  Some
theoretical examples:

Is the machine /actually/ a multiprocessor ?  If not then some synchronisation
primitives can be replaced by no-ops.

Is a (virtual) method /actually/ overridden by any class loaded at runtime ?
If not then optimisations like static linking or even inlining become possible.

Does the processor have an extended instruction set ?  If so then the JITer can
generate code which uses those instructions.  (A statically precompiled binary
could include both sets of code, of course, with dynamic switching between
them, but that is not often deemed worth the extra bother).

As far as I know, all of those possibilities are implemented (if only in
limited ways) in current Sun JVMs.

> How does one get Java to run faster than a compiled language?

Simple: compare it with a compiled language with a bad optimiser ;-)

FWIW, I think it is /highly/ application dependent, and there is no simple set
of rules you can follow to make Java run as fast as possible.  My impression is
that the optimiser in the server JVM, from 1.5 (and presumably later) generates
code which is comparable with GCC -o3 or MS's C++ compiler with all obvious
optimisations turned on -- however that is a useless observation unless the
code in the two languages is trying to do the same thing (e.g. 2-D arrays have
different layouts in C and Java, or a calculation might create many
intermediate objects in carefully-written Java whereas the "same" code in
well-crafted C++ might not).

   -- chris
raddog58c - 07 Feb 2007 19:19 GMT
On Feb 7, 11:14 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
THIS.org> wrote:
> > > Sometimes the penalty can be zero or even negative, particularly for
> > > applications which run long enough to eliminate the startup effects of
[quoted text clipped - 22 lines]
> As far as I know, all of those possibilities are implemented (if only in
> limited ways) in current Sun JVMs.

These are good... thanks.  It would depend on the nature of the
application, as some of the optimizations, unless significant,
wouldn't make up the difference in the time it took to compile the
byte code into machine code.

> > How does one get Java to run faster than a compiled language?
>
> Simple: compare it with a compiled language with a bad optimiser ;-)

That'd be one of several ways... touche!

> FWIW, I think it is /highly/ application dependent, and there is no simple set
> of rules you can follow to make Java run as fast as possible.  My impression is
[quoted text clipped - 5 lines]
> intermediate objects in carefully-written Java whereas the "same" code in
> well-crafted C++ might not).

This does presume that we're comparing the post-compiled byte code
against the precompiled code in the runtime binary (.EXE, .COM, etc).
The fact the conversion is done at run time and would have to be done
every time the code is run (unless it's cached) puts it at a
disadvantage out of the gate. The late binding to environment could
help close the gap, but that's not guaranteed because the .EXE can be
precompiled for the target deployment environment and if so the race
is over.

I'm nonetheless impressed with the computer scientists building
optimizations into the JVM -- they've a lot of clever techniques up
their respective sleeves.

>     -- chris
Mark Thornton - 07 Feb 2007 20:07 GMT
> On Feb 7, 11:14 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
> THIS.org> wrote:
> disadvantage out of the gate. The late binding to environment could
> help close the gap, but that's not guaranteed because the .EXE can be
> precompiled for the target deployment environment and if so the race
> is over.

The single/multi processor state can be changed after an application has
been installed. A JVM will adjust accordingly, but happens to your EXE
that was selected/compiled for the single processor that existed at
install time? There used to be an issue with maths coprocessors (and may
be again if AMDs ideas surface in a product). I think it is still
possible for processor upgrades to add SSE3 or similar capability.

Mark Thornton
Chris Uppal - 08 Feb 2007 13:33 GMT
> A JVM will adjust accordingly, but happens to your EXE
> that was selected/compiled for the single processor that existed at
> install time?

Which raises the question of how much extra testing is needed to account for
the fact that the JITer will adapt to the host processor and thus be executing
different programs on different machines ?

Personally, I think the answer is "damn little" -- except in special cases.
The JVM people have shown themselves to be good at producing implementations
which behave the same despite the adaptive behaviour (except considerations of
speed, naturally ;-) so I'd put my testing budget into looking for the mistakes
/we/ make in preference to the ones that Sun's VM engineers make.

The one qualification I'd make to that is that the testing machines shouldn't
be such that they are likely to /mask/ problems -- so they should be
multiprocessor boxes, and with no more "grunt" than the worst-case target
machine.  (Though, I suppose a case could be made that at least some testing
should be done on a machine with as near as possible the same grunt as the
fastest/biggest target machine during the release's anticipated lifetime.)

   -- chris
senior - 12 Feb 2007 13:58 GMT
the most advantage for java is in networks or distributing
computation

and please googling about advantage of java to see why it is popular
then welcome to any question
raddog58c - 10 Feb 2007 19:42 GMT
> The single/multi processor state can be changed after an application has
> been installed. A JVM will adjust accordingly, but happens to your EXE
> that was selected/compiled for the single processor that existed at
> install time?

Well, I'm recompiling it for the new target environment while they're
swapping the gear on the production box.

The entire argument around auto-configuration of runtime code is
valid, but it assumes you'll be changing the underlying platform over
the lifetime of the code.  While that can and does happen in some
environments, sometimes frequently, there are many envs in which
that's not the case, or change is infrequent enough that it's trivial
to regenerate the runtime for the appropriate target if you need the
fastest speed you can get.

And just as the JVM can autoadapt, a programmer could build different
versions.  Obviously it's not a side effect of the environment when
you're spinning it yourself, but just to be fair regarding all
available solutions.... startup logic could perform the same set of
checks to find out what hardware is installed on the host, and the
startup overlord can manipulate the runtime in some manner (rename the
binaries, change startup link pointers, etc) to facilitate the best-of-
breed for that environment accordingly.

The difference, obviously, is that with Java the programmer doesn't
have to concern her or himself with these mundane details -- all java
programs will inherently obtain this effect based on the JVM's
capacity to provide it.  The skilled engineer can easily build a
library of platform checking and invoke or autoinstall the right code
when the changed hardware is detected.  If that dynamicism were needed
based on the problem space, it's a reasonably easy feature to build.
"Back in the day" when controllers were really dumb, we used to write
streaming tapes with interrecord gaps that were the minimum size
possible and still provided ample time to perform inter-record
processing (programming DMA controllers, generating CRCs, etc) on the
slowest processors on the market at the time.  It was always
interesting, as we had to run a sequence of instructions at startup to
see how fast they'd execute -- the faster they ran, the longer we
needed to configure our wait/delay loops in the inter-record
processing routines.  That stuff was all based on the main clock,
since "smart" controllers weren't available to most devices at the
time, so drivers ran on the host's CPU.

Anyways, I'm not poopoo'ing the dynamic feature of Java -- it's very
cool and has its place to be sure.  By the same token, HW engineers
may eventually standardize CPUs and aux processors, like FP
processors, to render the differences insignificant.  And it's like a
lot of things provided by Java's dynamicism: they're awesome when you
need'm, pointless and wasteful when you don't.

More often than not I don't require the dynamicism, so it alone is not
a big selling point for me. Your mileage may vary, and that's cool,
it's just for me Java's dynamics are not why I choose it -- as I
mentioned, for me the overhead is a turn off.  Heck, I'm not even
attracted by the fact the programmer is freed from the need to manage
her or his memory space -- I don't personally find memory mgmt
terribly challenging work.

What sells me on Java is the breadth of elegant frameworks, the
general (not always) ease of adding new functionality by adopting
classes into your app, and the completenes of many classes out and
about.

For example, we use Spring framework at the office, and it's a really
fabulous way to construct a distributed network of components as
services, and it does certain things to greatly reduce the "noise" in
code such as using "injection" to clean up constructors.  Pretty
compelling stuff, and really a much bigger selling point, IMO, than
running under the auspices of a JVM.
Lew - 11 Feb 2007 06:18 GMT
> Anyways, I'm not poopoo'ing the dynamic feature ...

"Pooh-poohing". "Pooh-pooh" is to express derision.  "Poopoo" is a euphemism
for animal excrement.

- Lew
nukleus - 11 Feb 2007 12:29 GMT
>> Anyways, I'm not poopoo'ing the dynamic feature ...
>
>"Pooh-poohing". "Pooh-pooh" is to express derision.  "Poopoo" is a euphemism
>for animal excrement.

What a gem.

I wish I could have you sitting on my bookshelf.
If only I could push a button and say:

Lew, tell me, what is THIS thingy here?

And, sure enough, there is ALWAYS an answer.

Have you thought of Virtual Lew project?

Could be quite a thing...

Do you want me to quick prototype that thing
to give ya an idea of what can be done?

>- Lew
Lew - 11 Feb 2007 18:13 GMT
raddog58c wrote:
>>> Anyways, I'm not poopoo'ing the dynamic feature ...

Lew wrote:
>> "Pooh-poohing". "Pooh-pooh" is to express derision.  "Poopoo" is a euphemism
>> for animal excrement.

> What a gem.
>
[quoted text clipped - 11 lines]
> Do you want me to quick prototype that thing
> to give ya an idea of what can be done?

Great idea.

- Lew
nukleus - 12 Feb 2007 15:38 GMT
>raddog58c wrote:
>>>> Anyways, I'm not poopoo'ing the dynamic feature ...
[quoted text clipped - 21 lines]
>
>Great idea.

Well. I didn't expect youre gonna buy it.
But...

Oki, doki.
Just warn everybody here
that some things may...

Well, just relax if my monkey gets too wild.
You might see zome posts of yerr royal highness.

Lets narrow it down.
Which subject would you pick to reflect the
"Best of Lew"?

Your turn now.

>- Lew
raddog58c - 11 Feb 2007 19:11 GMT
> > Anyways, I'm not poopoo'ing the dynamic feature ...
>
> "Pooh-poohing". "Pooh-pooh" is to express derision.  "Poopoo" is a euphemism
> for animal excrement.
>
> - Lew

<<"Pooh-poohing". "Pooh-pooh" is to express derision.  "Poopoo" is a
euphemism
for animal excrement.

- Lew >>

Thank you, and as the radicaldog that I am, I restate that I was not
poopooing Java.  ;-)
Chris Uppal - 11 Feb 2007 15:55 GMT
> And just as the JVM can autoadapt, a programmer could build different
> versions.  Obviously it's not a side effect of the environment when
[quoted text clipped - 4 lines]
> binaries, change startup link pointers, etc) to facilitate the best-of-
> breed for that environment accordingly.

That approach runs into the problem of combinatorial explostion -- which is why
it is only used in limited ways and in rather extreme cases.  The thing is that
a JIT has more information available to it than any possible static analysis.
That is a /fundamental/ advantage, and cannot be clawed back (though it can be
wasted); just as having to do extra work at runtime is a fundamental
/disadvantage/ which can only be compensated for, but never eliminated.

BTW, I'd not advocating one approach over the other here, just discussing what
the approach taken by current JVM's /is/.

   -- chris
raddog58c - 11 Feb 2007 19:53 GMT
On Feb 11, 9:55 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
THIS.org> wrote:
> > And just as the JVM can autoadapt, a programmer could build different
> > versions.  Obviously it's not a side effect of the environment when
[quoted text clipped - 16 lines]
>
>     -- chris

Excellent points.
Arne Vajhøj - 09 Feb 2007 02:55 GMT
> On Feb 7, 11:14 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
> THIS.org> wrote:
[quoted text clipped - 9 lines]
> wouldn't make up the difference in the time it took to compile the
> byte code into machine code.

Yes - unless significant.

> This does presume that we're comparing the post-compiled byte code
> against the precompiled code in the runtime binary (.EXE, .COM, etc).
> The fact the conversion is done at run time and would have to be done
> every time the code is run (unless it's cached) puts it at a
> disadvantage out of the gate. The late binding to environment could
> help close the gap,

Or it could cover the gap 200%.

You are basically proving that Java is not efficient by assuming so.

Meaning you proved nothing.

Arne
raddog58c - 11 Feb 2007 00:09 GMT
> Or it could cover the gap 200%.
>
[quoted text clipped - 3 lines]
>
> Arne

I wasn't trying to prove anything.  Common sense says if you have to
convert from one format to another before you begin executing, you
have an extra step and obviously all things being equal you are not as
efficient, period, end of sentence.

Converted code that's more efficient you could make up for the
conversion - it would depend on the problem space, run duration, and
how well/badly each program were written.  The converted code would
need to be more efficient to have a chance to make up for the extra
step. If it were equal to or less efficient, you will not make up the
gap.

That's not a proof -- it's an observation of reality, right?

Well written code in a language like C optimally compiled for the
native environment is going to be tough to beat unless you write in
native assembler langauge.  I have actually had to write in native
assembler on more than one occasion in real-time systems where
nanoseconds mattered.  That's atypical, but these situations do
exist.  Anyone suggesting an interpreted language is the way to go in
these environments either has no understanding of the problem space,
or they've got a lot of explaining to do to make that assertion stick.

At any rate, the bottom line is you can't execute 100 instructions
faster than 50 instructions if they're running on average the same
number of clock cycles -- until someone invents a CPU with an ALU that
executes intermediate code directly, cycles have to be used by the
interpreter to transform to the native binary.

I used to program in Lisp years ago it had similiar issues. Don't hold
me to it, but if I remember right there was a movement by Lisp
aficionados to create a CPU that did exactly that: executed Lisp
directly.  These machines obviously never gained a lot of popularity.

There's a beauty in interpreted languages, but instruction-level
efficiency is a trade off you make for the functionality and late-
binding paradigm that interpretation provides. I give the JVM
architects a lot of credit, as under the right circumstances they
glean a lot of efficiency out of Java byte code -- but on balance the
Java code I've worked with in batch, GUI, and web server applications
has not been impressive from a speed standpoint.  Functionality wise
it's great, however, so that's the emphasis upon which one should
focus with respect to interp. languages, IMHO.
Lew - 11 Feb 2007 06:22 GMT
> Well written code in a language like C optimally compiled for the
> native environment is going to be tough to beat unless you write in
[quoted text clipped - 4 lines]
> these environments either has no understanding of the problem space,
> or they've got a lot of explaining to do to make that assertion stick.

Others have pointed out that the JIT compiler can beat compile-time
optimizations in some cases by virtue of having a different view of the
situation. Escape analysis, for example, is a runtime phenomenon.

Because JVM optimizations are global and dynamic, whereas compile-time
optimizations are more local and static, the JVM may actually achieve
significantly better performance because it follows different analysis paths.
This could, and according to what I've read, does achieve far better
performance than "C optimally compiled" code.

Myths are supported by misguided intuition. Reality often moves in surprising
ways.

- Lew
raddog58c - 11 Feb 2007 19:14 GMT
> > Well written code in a language like C optimally compiled for the
> > native environment is going to be tough to beat unless you write in
[quoted text clipped - 19 lines]
>
> - Lew

Fair enough, but the key words are "may" and "could" -- often they
will not, particularly in situations where there really is no
alternative improvement.  In that case the additional overhead to
store, manage and reference a knowledge base "may" and "could" further
deplete resources and degrade the overall system's throughput.

It's really 100% context sensitive.
Arne Vajhøj - 11 Feb 2007 20:23 GMT
>> Because JVM optimizations are global and dynamic, whereas compile-time
>> optimizations are more local and static, the JVM may actually achieve
>> significantly better performance because it follows different analysis paths.
>> This could, and according to what I've read, does achieve far better
>> performance than "C optimally compiled" code.

> Fair enough, but the key words are "may" and "could" -- often they
> will not,

We are impressed by your argumentation technique.

You are basically arguing that "Java is slow because it is slow".

Arne
Chris Uppal - 11 Feb 2007 21:38 GMT
> > > Because JVM optimizations are global and dynamic, whereas compile-time
> > > optimizations are more local and static, the JVM may actually achieve
[quoted text clipped - 8 lines]
>
> You are basically arguing that "Java is slow because it is slow".

I don't think he is, you know.

It is legitimate to ask, in response to a claim[*] that <such and such> a
technique /can/ have an advantage, how often, and by how much, that advantage
manifests in practice.

Scepticism != blind prejudice.

   -- chris

[*] or even proof.
Arne Vajhøj - 11 Feb 2007 22:25 GMT
>> We are impressed by your argumentation technique.
>>
>> You are basically arguing that "Java is slow because it is slow".
>
> I don't think he is, you know.

He is arguing that Java performance is less than C/C++
performance based on an assumption that the JIT compilation
runtime overhead is bigger than the JIT over AOT gain.

To me that is basing the conclusion on an assumption that
is equivalent to the conclusion.

Arne
Christian - 11 Feb 2007 23:21 GMT
Arne Vajhøj schrieb:
>>> We are impressed by your argumentation technique.
>>>
[quoted text clipped - 10 lines]
>
> Arne

Then what makes the main difference in speed? Is it rather such nice
features like checking for ArrayOutOfBound that java is a bit slower?

I once implemented a cryptograhic hashfunction .. and I couldn't come
closer to the speed of a c++ implementation than a factor of about 0.5.

since its purely deerministic I would assume a hashfunction can be very
well optmized at compile time in case of c++ and JIT and AOT should not
matter much to the program?

- Christian
Arne Vajhøj - 11 Feb 2007 23:38 GMT
> Then what makes the main difference in speed? Is it rather such nice
> features like checking for ArrayOutOfBound that java is a bit slower?

The point is that Java is not always a bit slower. Sometimes it is a bit
slower, sometimes it is a lot slower, sometimes it is faster.

> I once implemented a cryptograhic hashfunction .. and I couldn't come
> closer to the speed of a c++ implementation than a factor of about 0.5.
>
> since its purely deerministic I would assume a hashfunction can be very
> well optmized at compile time in case of c++ and JIT and AOT should not
> matter much to the program?

Difficult to say without seeing the code.

0.5 sounds awfully high, but it could be.

It also depends on the Java version being used and
whether run in client or server VM.

Arne
John W. Kennedy - 12 Feb 2007 03:46 GMT
> Arne Vajhøj schrieb:
>>>> We are impressed by your argumentation technique.
[quoted text clipped - 19 lines]
> well optmized at compile time in case of c++ and JIT and AOT should not
> matter much to the program?

And, on the other hand, the authors of "Core Java" report that a Java
implementation of the sieve of Eratosthenes using the Java Bitset class
regularly outperforms an equivalent C++ program using the C++ Bitset
template, even when they wrote a new Bitset template.

Signature

John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
  -- Charles Williams.  "Taliessin through Logres: Prelude"

raddog58c - 12 Feb 2007 04:09 GMT
> > Arne Vajhøj schrieb:
> >>>> We are impressed by your argumentation technique.
[quoted text clipped - 30 lines]
> Nourished the land on a fallacy of rational virtue."
>    -- Charles Williams.  "Taliessin through Logres: Prelude"

That's actually very interesting... why do you think that's the case?
I'm guessing the C++ Bitset template is grossly inefficient. Never
used it so I don't know. It's intriguing -- I plan to run some
benchmarks this week to compare ASM, C, C++ and Java -- time will
dictate how extensive I go, and I'll keep the tests simple and as
equivalent as possible.  I'll post the results to this thread and we
can see it looks like.

Thanks for sharing that.
John W. Kennedy - 12 Feb 2007 19:20 GMT
>> And, on the other hand, the authors of "Core Java" report that a Java
>> implementation of the sieve of Eratosthenes using the Java Bitset class
>> regularly outperforms an equivalent C++ program using the C++ Bitset
>> template, even when they wrote a new Bitset template.

> That's actually very interesting... why do you think that's the case?
> I'm guessing the C++ Bitset template is grossly inefficient.

That's what they thought, so they wrote their own. It was better, but
Java still outperformed it.

Signature

John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
  -- Charles Williams.  "Taliessin through Logres: Prelude"

raddog58c - 13 Feb 2007 12:51 GMT
> > That's actually very interesting... why do you think that's the case?
> > I'm guessing the C++ Bitset template is grossly inefficient.
[quoted text clipped - 3 lines]
>
> --

So do you suppose this was a case where late binding made the
difference, or do you think Java's implementation of whatever is used
by the aforementioned algorithm is significantly better?

I ask because I'm curious what other techniques might be used by the
JVM beyond instruction set coercion (sp?).  Does the JVM recognize the
algorithm or some facet of it and take shortcuts the C++
implementation doesn't, or does this algorithm use vast amounts of
heap storage?

One thing that's different between C++ and Java is memory.  I believe
the JVM grabs everything it will ever need at startup, but I'm not
100% sure this would be the case in a C++ .EXE.  There might be some
memory dynamics in the form of quazi "lazy init" in C++ (ie, non-
preallocated heap) that causes C++ to invoke the native OS's getmem
API where the JVM does that upfront.

Again, don't know the algorithm, but if there's no instruction set
advantages and no algorithmic pruning taking place, then my guess
would be something along the lines of memory management.  I do think
the JVM manages memory better than most operating systems -- it would
have to because the cost of keeping it clean via garbage collection
would overly expensive.
Mark Thornton - 13 Feb 2007 14:07 GMT
> One thing that's different between C++ and Java is memory.  I believe
> the JVM grabs everything it will ever need at startup

It reserves the address space for the maximum heap size but only
allocates memory for the minimum heap size. The reservation means it can
be sure that the heap will be contiguous which gives performance
advantages to the implementation. One downside to this is that memory
reports by OS utilities can be incredibly confusing (i.e. useless).

Mark Thornton
raddog58c - 13 Feb 2007 15:43 GMT
On Feb 13, 8:07 am, Mark Thornton <mark.p.thorn...@ntl-spam-world.com>
wrote:
> > One thing that's different between C++ and Java is memory.  I believe
> > the JVM grabs everything it will ever need at startup
[quoted text clipped - 6 lines]
>
> Mark Thornton

Not really useless.  Whatever the JVM has set aside is not available
to processes which run under the auspices of the OS but not the JVM.
Some JVM boosts are misleading because they turbo charge their
applications but impact others running in the same storage space.
That's a point that's missed by applications programmers in some
performance comparisons -- it's not missed by systems programmers,
however, because the OS is managaging all active processes, the JVM
being nothing more than one of them.

Now if your machine runs purely for the sake of the JVM, like a
webserver for instance, then I agree that the JVM's use of memory is
misleading, because what it's done is establish the best-fit operating
environment for itself, and that's what you want on something like a
webserver.

Conversely, on something like your workstation where you'll
(presumably) be using a mix of apps each running in their own process
space, resource hungry processes and threads hurt system performance
because they greedily suck up everything around them.

That's not bad -- it's how it works, and one needs to be consciously
aware of the effects so they don't get burned by them.
Mark Thornton - 13 Feb 2007 15:55 GMT
> Not really useless.  Whatever the JVM has set aside is not available
> to processes which run under the auspices of the OS but not the JVM.

A reservation has no effect on other processes. It simply reserves a
range of addresses within the JVM's process. It makes no difference to
the amount of memory available to other processes. Note that what has
been reserved is ADDRESS SPACE, which is not the same thing as memory.
When the heap needs to expand, additional memory will be mapped into
that reserved space.

Mark Thornton
raddog58c - 13 Feb 2007 16:28 GMT
On Feb 13, 9:55 am, Mark Thornton <mark.p.thorn...@ntl-spam-world.com>
wrote:
> > Not really useless.  Whatever the JVM has set aside is not available
> > to processes which run under the auspices of the OS but not the JVM.
[quoted text clipped - 7 lines]
>
> Mark Thornton

I used to do capacity planning and performance monitoring of our
webservers.  We used IBM's Websphere App Server and HTTP Server.  One
thing I noticed was CPU usage would fluctuate greatly, but memory
usage was level.  It seemed to me the JVM grabbed all the memory it
was configured to start with at the time it started and never
reliquished it, meaning it was not available to any other processes on
the server.

Is there an option to have the JVM startup with a tiny heap and
dynamically expand and shrink it if needed?  I don't know one way or
the other to be honest, but if there is we weren't using it in our
environment. The JVM carved its space and never moved in one direction
or another over its lifetime.
Chris Uppal - 13 Feb 2007 18:04 GMT
> I used to do capacity planning and performance monitoring of our
> webservers.  We used IBM's Websphere App Server and HTTP Server.  One
[quoted text clipped - 3 lines]
> reliquished it, meaning it was not available to any other processes on
> the server.

It's quite possible that IBM's server-class JVM implementations use memory in a
very different pattern from how Sun's desktop-class JVM implementations do.
(Especially when you remember that both Sun and IBM are in the business of
selling big iron ;-)

It makes (in my opinion) a lot of sense for a JVM running a server to grab
actual memory at startup.  That would make no sense at all for a JVM used for
running desktop applications where (a) the demand is much less predictable, and
(b) there is unlikely to be someone around with the skill and time to tune each
application separately.

That said, I do think Sun's desktop implementations are less flexible in this
respect than they should be.

   -- chris
Mark Thornton - 13 Feb 2007 19:23 GMT
> Is there an option to have the JVM startup with a tiny heap and
> dynamically expand and shrink it if needed?  I don't know one way or

Yes. Although this depends on the JVM in question. For Sun's JVM the
option -Xms is used to specify the minimum heap size. For example -Xms4m
would give a 4MB minimum heap size. Similarly -Xmx specifies the maximum
heap size. The minimum permitted heap size is 1MB. Within the min/max
limits the heap will grow and shrink as required, although it doesn't
always shrink as quickly as might be desirable.

It may be that your web server system configured the JVM with a high
minimum value for the heap (possibly even equal to the maximum value).
It is quite common to do this with server type applications --- e.g. you
can specify a minimum value for SQL Server to grab.

Mark Thornton
Arne Vajhøj - 14 Feb 2007 01:46 GMT
> I used to do capacity planning and performance monitoring of our
> webservers.  We used IBM's Websphere App Server and HTTP Server.  One
[quoted text clipped - 9 lines]
> environment. The JVM carved its space and never moved in one direction
> or another over its lifetime.

Yes.

    -Xms<size>        Set initial Java heap size
    -Xmx<size>        Set maximum Java heap size

But if the machine is dedicated to only run WAS they may
likely have set both to about 3/4 of the total memory.

Arne
raddog58c - 15 Feb 2007 12:03 GMT
> Yes.
>
[quoted text clipped - 5 lines]
>
> Arne

Thank you for the parms, and yes, I'm certain you're correct as
running WAS is the reason for those servers' existence.
Arne Vajhøj - 14 Feb 2007 01:44 GMT
> So do you suppose this was a case where late binding made the
> difference, or do you think Java's implementation of whatever is used
> by the aforementioned algorithm is significantly better?

My guess is that it is a case of where the C++ compiler were not
able to effectively optimize a certain language construct.

Sometimes weird effects pop up.

Some people have found that:

x ^= true;

is faster than:

x = !x;

in Java.

WTF

> One thing that's different between C++ and Java is memory.  I believe
> the JVM grabs everything it will ever need at startup,

Not true. It startup with a certain value and expand as needed
up to the maximum allowed.

Arne
Chris Uppal - 12 Feb 2007 16:50 GMT
> I once implemented a cryptograhic hashfunction .. and I couldn't come
> closer to the speed of a c++ implementation than a factor of about 0.5.
>
> since its purely deerministic I would assume a hashfunction can be very
> well optmized at compile time in case of c++ and JIT and AOT should not
> matter much to the program?

That's what I would have expected too.

Assuming that your code was transforming binary data (byte[] arrays) into more
binary data, and that your code wasn't dependent on the speed, or lack of it,
of utility code (such as, perhaps, java.math.BigInteger), then I don't really
see why there should be much difference.

<wild guess>
Did your code make much use of 2-dimensional arrays (or higher) ?  That's the
only thing I can think of where similar-looking code in C and Java would
actually be doing something different.
</wild guess>

Bounds checking might make some difference, but I can't imagine it making a 2x
difference.  (The impact of checking depends on the code, and on how good a job
the JIT and/or the processor's pipeline can do).

   -- chris
Arne Vajhøj - 13 Feb 2007 02:26 GMT
>> I once implemented a cryptograhic hashfunction .. and I couldn't come
>> closer to the speed of a c++ implementation than a factor of about 0.5.
[quoted text clipped - 4 lines]
>
> That's what I would have expected too.

> <wild guess>
> Did your code make much use of 2-dimensional arrays (or higher) ?  That's the
> only thing I can think of where similar-looking code in C and Java would
> actually be doing something different.
> </wild guess>

I have seen huge differences for Java just between running with
-client and -server.

Arne
raddog58c - 13 Feb 2007 12:54 GMT
> I have seen huge differences for Java just between running with
> -client and -server.
>
> Arne

Which has seemed to be better, and do you understand what is making
the difference?

I do wish there were better built-in JVM hooks for extracting what's
going on inside.... aren't "they" (JVM peeps) standardizing on some
such things?  It would sure help, because "unit" tuning is fine if
you're wearing blinders, but if you're not careful you can make one
thing run like a bat out of hades and in turn impact everything else
that lives in the same space.

Tuning and better JVM diags/stats is an area of great opportunity,
IMHO.
Mark Jeffcoat - 13 Feb 2007 14:28 GMT
>> I have seen huge differences for Java just between running with
>> -client and -server.
[quoted text clipped - 13 lines]
> Tuning and better JVM diags/stats is an area of great opportunity,
> IMHO.

You may be looking for the JVMTI--the "JVM Tool Interface".

It's not something I've looked into, since the simplest
of profiling has so far sufficed for me, but it offers
an extremely detailed look at what's going on under the
hood.

http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html

Signature

Mark Jeffcoat
Austin, TX

Arne Vajhøj - 14 Feb 2007 00:43 GMT
>> I have seen huge differences for Java just between running with
>> -client and -server.
>
> Which has seemed to be better, and do you understand what is making
> the difference?

-server is optimizing better than -client at the cost of
more time spend JIT compiling.

Arne
raddog58c - 12 Feb 2007 03:38 GMT
> >> We are impressed by your argumentation technique.
>
[quoted text clipped - 5 lines]
> performance based on an assumption that the JIT compilation
> runtime overhead is bigger than the JIT over AOT gain.

No, I'm assuming it takes longer to translate a thing and run it, than
it does to skip translation and just run it. I think that's a fair
assumption.

BTW, this is not a C/C++ vs Java competition.  I'd write something in
native assembler taking advantage of every facet of the hardware if
that's what I needed to do.  The fact is I code in Java, C, C++, Perl
and a smidgeon of other things (COBOL, Shell script, and visual basic
to name a few) these days.  I like them all for certain things, and
they each have drawbacks for other things.

Just wanted to make sure I was clear on where I stand.

> To me that is basing the conclusion on an assumption that
> is equivalent to the conclusion.
>
> Arne

If you want to be 100% accurate, what I said was if you have to
translate code to run it, then eliminating the translation step would
cause the code to run faster.

Java has to be translated before you run it.

I believe these two statements are true.  You can ding me and say you
translate Java once and then it can be cached or wathever, and that's
fine, but at load time it has to be translated.  Byte code != native
instruction set.

An assertion was made that late binding can take advantages of the
environment to supply more efficient instructions. That's a theory.
What if the non-translated program is alread optimized for the
environment?  Does someone want to suggest a JVM can translate a
program and then perform some magic to make the ALU run faster? Of
course not. Late binding only helps in circumstances where there's an
opportunity to improve by substituting a more efficient set of
instructions.  As long as we're comparing apples to oranges then sure,
we can say there's an opportunity for a difference.  If we're
comparing apples to apples, adding a translation step is going to be
slower.

Let me put it another way.

Java class abc is invoked for the first time -- we fire up and load
the JVM, abc gets translated to native code and executed -- Java class
abc is invoked again, only this time it's sitting in cache and is
ready to go so it's executed -- which instance of abc's execution ran
faster?  I'm claiming the 2nd -- does someone want to argue the 1st
execution is faster?  That firing up the JVM, translating the program,
then executing it is faster than just executing it, or just reading it
off disk and executing it?

This is what I'm saying.  If these were 2 distinct programs, the 2nd
should win every time (ignoring interrupts, system activity, etc)
because we're comparing apples to apples.

FWIW, some of the late-binding advantage can be simulated with
reasonably simple code in any language. For instance an installation
script could check installed hardware and deploy the best from several
versions, or I could build a run-time hardware checker/loader to pull
code from a different DLL based on environments. I have created code
to do these kinds of things. It's not out of the box functionality,
but it's not very difficult either.

Also, a JVM can pay attention to usage, provide caching, adapt at run
time, etc., but such a service requires space and processing cycles to
supply.  That means *potentially* it could make a single processing
unit it serves run faster or maybe not, but the service's computations
come at the expense of the operating environment as a whole.  Most
comparisons focus on a single program's execution, but the resources
consumed by the JVM to look for runtime optimizations are resources
unavailable to other processes in the operating environment.  How much
I don't know, but computational knowledge requires memory to store and
processor cycles to access.  I do know large Java apps running on my
workstation will "blackout" (eg become unresponsive to mouse clicks),
and I don't experience the same from my C, C++ or Perl scripts -- I
presumed garbage collection was running, but maybe it's the JVM
looking for ways to run its apps and classes faster. 8-)

One other thing that I believe hurts Java's runtime environment is the
built-in unicode support. I have no use for unicode in my
applications, but Java stores everything as unicode, right? So that
means twice as much memory, and the need to convert from ASCII or
EBCDIC into and out of unicode, over and over again, and for nothing.
Is there are really good reason for mandatory unicode support?  This
hurts Java's runtime model, IMO, and it's pointless if you don't need
it.

Anyway, this is a really interesting discussion to me.  I think the
fact there's a discussion at all is a very big compliment to the
software engineers constructing the JVMs.  They're obviously good.
Mark Thornton - 12 Feb 2007 13:07 GMT
> reasonably simple code in any language. For instance an installation
> script could check installed hardware and deploy the best from several
> versions, or I could build a run-time hardware checker/loader to pull
> code from a different DLL based on environments. I have created code
> to do these kinds of things. It's not out of the box functionality,
> but it's not very difficult either.
How difficult this is depends on how many variations you need to test.
For example SSE 1,2,3, AMD; number of processors (1 or >1). So 4 or 5
different levels of accelerated math, plus the number of processors, you
could be looking at 10 different DLLs for the same function. This gets
very tedious. The usual response is simply to insist on say SSE2.

> One other thing that I believe hurts Java's runtime environment is the
> built-in unicode support. I have no use for unicode in my
> applications, but Java stores everything as unicode, right? So that
> means twice as much memory, and the need to convert from ASCII or
> EBCDIC into and out of unicode, over and over again, and for nothing.
> Is there are really good reason for mandatory unicode support?

Not having to duplicate all the methods which take String parameters.
What encoding would you assume for 'narrow' character strings?

Mark Thornton
raddog58c - 13 Feb 2007 11:41 GMT
On Feb 12, 7:07 am, Mark Thornton <mark.p.thorn...@ntl-spam-world.com>
wrote:

> > reasonably simple code in any language. For instance an installation
> > script could check installed hardware and deploy the best from several
[quoted text clipped - 20 lines]
>
> Mark Thornton

I'm thinking I wouldn't have to worry because the JVM would handle,
right?  This would be an ideal run-time derived feature for the JVM to
provide -- NOP unnecessary translations based on the mode of
operation.

If I needed to know, a System.Getyadda call should obtain it.

If I was building a rather large application for in-house use only, I
wouldn't really care because I'd always be using the same format.
Today I don't spend an inordinate amount of time converting outside of
persisting data. I'm just saying it should improve performance if you
can eliminate unnecessary transformations.
Mark Thornton - 13 Feb 2007 12:01 GMT
>>Not having to duplicate all the methods which take String parameters.
>>What encoding would you assume for 'narrow' character strings?
[quoted text clipped - 5 lines]
> provide -- NOP unnecessary translations based on the mode of
> operation.

Using Unicode whether encoded as UTF-16 or UTF-8 eliminates a whole
world of pain the moment you see a character outside your current
character set. If you never see anything outside traditional ASCII then
you may not appreciate this. I'd be very happy to see all the
traditional character sets disappear (CP-437, etc), leaving only the
Unicode encodings. I've had data sent to me without any declaration of
the character set in use and had to guess on the basis of the words
contained. One case I never did figure out --- it had probably been
mangled by other software that didn't understand the character set.

A simple example here with traditional software is what happens to our
currency symbol (£, pounds sterling) if you mix up code pages.

Earlier you mentioned the doubling of space caused by Unicode (assuming
UTF-16). This is only valid if most of your memory was taken up by text.
The only applications where this is likely to be true (word processors
and the like), ought to be capable of handling a wider range of
characters than ASCII. Even writing in English, I want a generous range
of mathematical symbols available (I am a mathematician).

Mark Thornton
raddog58c - 13 Feb 2007 13:00 GMT
> Using Unicode whether encoded as UTF-16 or UTF-8 eliminates a whole
> world of pain the moment you see a character outside your current
[quoted text clipped - 17 lines]
>
> Mark Thornton

That's completely valid....

...if you need it.  It's not valuable if you don't.

I pay a lot of money for cable and HBO television which would be a
waste if I didn't own a TV.

Also, while it is a pain to convert when you have to, it's just as
much of a pain to convert when you don't feel you need to --
getChars() is a pain because I don't need anything by chars 99% of the
time in the particular code I'm writing.

Your mileage may vary, and that's where having an option suits both
needs.
Oliver Wong - 14 Feb 2007 21:01 GMT
[post re-ordered to group similar-topic paragraphs together]

[Mark wrote:]
> > Using Unicode whether encoded as UTF-16 or UTF-8 eliminates a whole
> > world of pain the moment you see a character outside your current
> > character set. If you never see anything outside traditional ASCII then
> > you may not appreciate this.
[...]
> > Even writing in English, I want a generous range
> > of mathematical symbols available (I am a mathematician).
[quoted text clipped - 5 lines]
> I pay a lot of money for cable and HBO television which would be a
> waste if I didn't own a TV.

[...]

> Your mileage may vary, and that's where having an option suits both
> needs.

   Not being able to input non-ASCII unicode characters in the textfield of
an application seems as archaic to me as not being able to enter "0" in a
numeric field, because early (e.g. bronze age) arithmetic systems did not
know about the concept of zero as a number.

   It's hard to believe that today, there are still mp3 players that are
unable to handle ID3 tags for songs with non-English characters in them.

> Also, while it is a pain to convert when you have to, it's just as
> much of a pain to convert when you don't feel you need to --
> getChars() is a pain because I don't need anything by chars 99% of the
> time in the particular code I'm writing.

   I don't understand your complaint: If you don't need getChars(), then
don't invoke it. What's the problem?

   - Oliver
raddog58c - 15 Feb 2007 01:01 GMT
>     I don't understand your complaint: If you don't need getChars(), then
> don't invoke it. What's the problem?
>
>     - Oliver

The data is stored in UNICODE whether you require it or not.  I'm not
writing multinational code at this juncture.  In 25+ years of
programming the number of times I've needed multinational character
sets can be counted on one had with fingers to spare.

You might find it archaic, but I find it wasteful.  It's a waste
converting into and out of a format you never use.

Why don't you convert your data into Russian characterset. Since
you're never communicating in Russian, when you need English, swap
back.  What's the big deal?

The big deal is why do that?  Nobody would do that if there wasn't a
reason.

That's what I'm saying. It's conversion to a format that I'm not
personally using. Some people need it; some don't; yet we all pay for
it.
Mark Thornton - 15 Feb 2007 09:03 GMT
> Why don't you convert your data into Russian characterset. Since
> you're never communicating in Russian, when you need English, swap
> back.  What's the big deal?

Some of my data consists of European place names, each in the relevant
language. That means I need all the characters in every European
language in the same data set.

I often holiday in a part of Italy where many of the place names are
given in two languages.

Some years ago I was at a party in Brussels where at least 6 languages
were being spoken, sometimes two languages in a single sentence.

The idea that you can Balkanize the data into nice separate compartments
with their own character set just doesn't work (not least in the Balkans
hee hee).

Mark Thornton
raddog58c - 15 Feb 2007 12:05 GMT
On Feb 15, 3:03 am, Mark Thornton <mark.p.thorn...@ntl-spam-world.com>
wrote:
> > Why don't you convert your data into Russian characterset. Since
> > you're never communicating in Russian, when you need English, swap
[quoted text clipped - 15 lines]
>
> Mark Thornton

Now for you it's an entirely different story, yes?  If I had these
requirements my feelings about UNICODE would be more in line with
yours.  Since I don't the conversion is a wasted expense.
Mark Thornton - 15 Feb 2007 13:46 GMT
> Now for you it's an entirely different story, yes?  If I had these
> requirements my feelings about UNICODE would be more in line with
> yours.  Since I don't the conversion is a wasted expense.

Java's compulsory use of Unicode means that any third party tools I use
will also work with data like mine. In systems where use of unicode is
optional (or non-existent) it is common to find tools that would be nice
if only they mad provision for characters beyond ASCII. It also
simplifies internationalization even where the original developer didn't
pay too much attention to these requirements.

Character set conversion can only be avoided if you can be sure of
always working in a single specified character set. Once you have to do
a conversion of some sort, conversion to/from Unicode isn't much (if
any) more expensive than conversion between simple single byte character
sets. Even in the US I expect you are likely to see data in CP-437,
CP-850, CP-1252, and ISO-8859-1 as well as UTF-8, and UTF-16.

Finally just how much difference does the extra overhead of Unicode
actually make? In most substantial applications the overhead will be
negligible.

Mark Thornton
Lew - 15 Feb 2007 15:18 GMT
raddog58c wrote:
>> Now for you it's an entirely different story, yes?  If I had these
>> requirements my feelings about UNICODE would be more in line with
>> yours.  Since I don't the conversion is a wasted expense.

> Java's compulsory use of Unicode means that any third party tools I use
> will also work with data like mine. In systems where use of unicode is
[quoted text clipped - 13 lines]
> actually make? In most substantial applications the overhead will be
> negligible.

Java is what it is. It has its reasons to be that way. Not all decisions are
optimal from all points of view. Some programmers don't need everything a
language or an API offers. The language or API still has to offer it.

This is no different from any language. Considering the overall population
that uses it, a language will always make compromises to satisfy the greatest
portion of that population.

If you don't like Java then there are alternatives. For what it does, and
considering Mark's points about the general usefulness of requiring Unicde and
its nearly complete lack of impact on code efficiency, Java is extremely well
suited.

If you wrote your own language without Unicode support, or with the
"Balkanized" version of it, you'd probably find pretty quickly that the "all
Unicode" approach offers significant advantages.

In the meantime, when we use Java we are stuck with all its warts as well as
its advantages. Focusing on corner issues like "it's always Unicode" or "there
are no closures" does not diminish the actual, real-world usefulness of the
language. (And some of these ideas, if sufficiently universally beneficial,
wind up in the language eventually anyway.)

- Lew
John W. Kennedy - 16 Feb 2007 04:53 GMT
> Character set conversion can only be avoided if you can be sure of
> always working in a single specified character set. Once you have to do
> a conversion of some sort, conversion to/from Unicode isn't much (if
> any) more expensive than conversion between simple single byte character
> sets. Even in the US I expect you are likely to see data in CP-437,
> CP-850, CP-1252, and ISO-8859-1 as well as UTF-8, and UTF-16.

Don't forget IBM-037 (US EBCDIC) and, thanks to the Euro, ISO-8859-15.

Signature

John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
  -- Charles Williams.  "Taliessin through Logres: Prelude"

raddog58c - 16 Feb 2007 19:11 GMT
> > Character set conversion can only be avoided if you can be sure of
> > always working in a single specified character set. Once you have to do
[quoted text clipped - 6 lines]
>
> --

I recently dealt with IBM037 aka CP037 in a STRUTS app from hell for
IBM's OnDemand product. I was unfamiliar with the CP support in the
String class prior to this endeavor, and now that I've used it I'm
less enthused.  It's simple (more or less), but obscure and very
infrequently used in this environment -- I had problems using it
(turned out to be a missing charset.jar on my workstation) and when I
tried to find someone familiar with codepage support I came up empty.

Is this commonly used by others?  It was difficult finding
understandable documentation on the WWW.  I had assumed I was going to
need to "add" support for it or something, but I couldn't find a clear
explanation of how to do it.

The XLAT assembler instruction (translate) would have been an order of
magnitude easier, but I couldn't use it in this context.  I could have
implemented it more easily via xlatChar = xlatTable[unxlatedChar].

In any event, after lots of perusing and debugging and looking at
other people's workstations, I found that the missing charset.jar was
my problem.  The Sun java documentation said CP037 was supported as
part of the Extended Encoding Set, so the
UnsupportedEncodingExceptions really threw me.
Mark Thornton - 16 Feb 2007 19:33 GMT
> In any event, after lots of perusing and debugging and looking at
> other people's workstations, I found that the missing charset.jar was

Outside the US installing a comprehensive set of character sets and
locales is (I think) the default. Because some crazy people ;-) whinge
about the extra space entailed when all they need is ASCII, the default
US install leaves all of this out!

Mark Thornton
John W. Kennedy - 17 Feb 2007 01:58 GMT
> I recently dealt with IBM037 aka CP037 in a STRUTS app from hell for
> IBM's OnDemand product. I was unfamiliar with the CP support in the
[quoted text clipped - 3 lines]
> (turned out to be a missing charset.jar on my workstation) and when I
> tried to find someone familiar with codepage support I came up empty.

The Java standard only mandates US-ASCII (ISO646-US), ISO-8859-1
(ISO-LATIN-1), UTF-8, UTF-16BE, UTF-16LE, and UTF-16. The choice of what
else to supply is the responsibility of the Java implementation. Sun
Java for Windows supplies something like 150 in all, not counting
aliases. If you were not using Sun Java for Windows (or, probably, Sun
Java for Solaris), IBM037 might not have been included (although it
would obviously be included in any implementation of Java for MVS).

> The XLAT assembler instruction (translate) would have been an order of
> magnitude easier, but I couldn't use it in this context.  I could have
> implemented it more easily via xlatChar = xlatTable[unxlatedChar].

I suspect x86 Java does use XLAT, where possible.

I really don't see what's so hard about:

    Charset cs037 = Charset.forName("ibm037");
               ...
    String st = new String(bytearray, cs037);
               ...
    Byte[] newbytes = st.getBytes(cs037);

(You can even leave off creating the Charset variable, but that would
mean looking the thing up at run time over and over again, which is
obviously wasteful, not to mention that you have to code to handle an
UnsupportedEncodingException, even if it's a SNOC.)

    Reader input = new FileReader(filename, "ibm037");

is even simpler, where it applies.

Signature

John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
  -- Charles Williams.  "Taliessin through Logres: Prelude"

Oliver Wong - 15 Feb 2007 15:35 GMT
>>     I don't understand your complaint: If you don't need getChars(), then
>> don't invoke it. What's the problem?
>
> The data is stored in UNICODE whether you require it or not.

   Well, Unicode is not a storage encoding system, or anything like that.
Unicode is primarily a mapping from characters (in the linguistic conceptual
sense, not in the C/C++ data type sense) to numbers. And you can't directly
store numbers in computers. You can store bitstreams, and thus you need an
extra step to encode from numbers to bitstreams. There are many such
encodings: ASCII, UTF-8, UTF-16, etc. some of them being lossy (e.g. ASCII).

>  I'm not
> writing multinational code at this juncture.  In 25+ years of
> programming the number of times I've needed multinational character
> sets can be counted on one had with fingers to spare.

   Well, I don't know what kind of software you write, so I can't comment
much on that. But consider how many people have requested that the
developers of WinAmp (a once popular mp3 player) to support unicode
characters, so that I WinAmp could probably display the names of my English,
French, Russian, Japanese and Korean songs. They refused to do so, stating
that 90% of the Internet is English (a figure I'm sure they just made up).
There are several problems with this argument.

   First of all, internet usage in Asia is huge. Gold farming (which
essentially comes down to playing video games online for pay) is a 1 billion
dollar business in Korea alone
(http://arstechnica.com/news.ars/post/20061227-8503.html), and playing video
games online is a tiny segment of the internet usage pie chart, compare to
web browsing, e-mail or file sharing, for example. According to
http://www.internetworldstats.com/stats2.htm, North America accounts for
only 20% of the internet usage, and while Internet usage is growing at a
rate of 100+% (i.e. doubling) over 7 years, Internet usage in the rest of
the world is growing at a rate of 200+% (i.e. tripling) over 7 years. This
last diagram really says it all: http://www.internetworldstats.com/stats.htm

   Second of all, just because one is an English-only speaker doesn't mean
one wouldn't benefit from the ability to display characters outside of ASCII
but within Unicode. Another poster presented the example of being able to
display mathematical symbols. I'll present an additional example of my mp3s
again.

   One of the ID3 tags for my mp3s contains what I believe to be russian
characters. I'm not sure, because I don't actually speak Russian. The artist
name can be viewed at http://en.wikipedia.org/wiki/T%C3%8B%D0%AFRA and it's
very easy for an English speaker to recognize: It's a T, an E with two dots
on top, a backwards R, a forwards R, and an A. And the prounciation "Terra"
comes intuitively. But try to load an ID3 tag with this text via an
ASCII-only mp3 player, and you'll only see gibberish.

   See, I don't even speak Russian, and yet I benefit from my software
being able to display Russian characters. That's why Unicode is more than
just "supporting other countries' languages". It's about being able to
represent text that you would normally find all around you in real life on
your computer.

> You might find it archaic, but I find it wasteful.  It's a waste
> converting into and out of a format you never use.

   What formats do you think one is converting to and from? There are bits
on the harddrive or RAM, and you need to somehow semantically treat these
bits as if they represented text. From what I understand, in C, you actually
manipulate these bits almost directly, and so an algorithm (e.g. testing
whether a character is numeric) designed to work with ASCII will not work
with EBCDIC and vice versa. In Java, things are a bit more high level: You
*don't* work directly with bits. Instead, you work with characters.
Theoretically, how these characters are represented in the JVM shouldn't
matter to you (in practice, due to backwards compabitility reasons, it has
"leaked out" that the internal representation is UTF-16-like). They might
internally be stored as UTF-16, UTF-8, or some crazy undocumented internal
format. It doesn't matter, because you shouldn't be manipulating the bits
that represent those characters, you should be dealing with the characters
directly. Any algorithm (e.g. testing whether a character is numeric) will
work regardless of the encoding, because the actual encoding is (supposed to
be) abstracted away.

   Now if you have a String of characters in memory, and you want to store
it on disk somehow, there are many encodings to do this, just like if you
wanted to store a binary tree on disk somehow, there are many encodings to
do this. *This* is where any "converting" might occur, though the term
"converting" is misleading: "encoding" would be a better term. You can
encode the text as ASCII, UTF-8, or some other format. And if you want to
read the bitstream from disk and convert it back to text, a decoding stage
occurs.

   In C, there's no similar stage, because once again, there's no
abstracting the encoding away from the text. If you want to replicate C's
behaviour in Java, rather than reading in text, read in bytes. Then, you can
manipulate the bytes in anyway you like, and if you think these bytes
represent text, you'll have to guess at the encoding (ASCII? EBCDIC? UT