Java Forum / General / February 2007
Help me!! Why java is so popular
amalikarunanayake@gmail.com - 02 Feb 2007 16:33 GMT java has become an important language in very short time. What are some of the things that have made it so popular?
Eric Sosman - 02 Feb 2007 17:11 GMT amalikarunanayake@gmail.com wrote On 02/02/07 11:33,:
> java has become an important language in very short time. What are > some of the things that have made it so popular? One point in Java's favor is that it's a pretty good teaching language. It protects the still-inept student from many kinds of beginners' mistakes, but doesn't coddle him to the point of cocooning. You can test this claim for yourself by taking a Java-based class and doing the homewo--
Oh. Wait; I get it.
 Signature Eric.Sosman@sun.com
Alex Hunsley - 02 Feb 2007 17:19 GMT > java has become an important language in very short time. What are > some of the things that have made it so popular? Java is important because lazy/cheating students can take classes in this language and not have to look far for someone to do their homework for them.
Jack Kielsmeier - 02 Feb 2007 17:31 GMT > java has become an important language in very short time. What are > some of the things that have made it so popular? One of the biggest reasons JAVA has taken off, is its ease of portability. When an application is written in a language like C or C++, it must be compiled separately for each platform that will run your application.
Java has a virtual machine that is able to take JAVA code information and translate it to the specific machine running the application on the fly. It makes it so you do not have to re-compile the application for every platform.
There are some downsides to this; JAVA is generally slower than a language like C++ because of the extra time needed by the Virtual Machine. With how fast computers are today, most people do not care about the performance penalty in their applications (the penalty is usually pretty small now).
Alex Hunsley - 02 Feb 2007 23:58 GMT >> java has become an important language in very short time. What are >> some of the things that have made it so popular? [quoted text clipped - 12 lines] > fast computers are today, most people do not care about the performance > penalty in their applications (the penalty is usually pretty small now). Whoo! That's him got off without having to do his homework! lex
raddog58c - 06 Feb 2007 21:45 GMT > <amalikarunanay...@gmail.com> wrote in message > [quoted text clipped - 6 lines] > When an application is written in a language like C or C++, it must be > compiled separately for each platform that will run your application. I think a big reason Java took off initially was the fact it's a C- language derivative without pointers. Most of the Java programmers I knew 10 years ago were disillusioned C/C++ people who found the memory management tedious and error prone. The fact they could instantiate and code and never clean up after themselves freed them to concentrate on the problems they really wanted to expend their energy solving.
The reason people would get into Java coding today probably has little to do with this and is due more to the language's large base of highly functional classes that plug-n-play easily, so programmers can slam dunk applications a lot easier. It's all about what's easy for us (programmers), eh?
> Java has a virtual machine that is able to take JAVA code information and > translate it to the specific machine running the application on the fly. It > makes it so you do not have to re-compile the application for every > platform. Certainly true, though the same can be said for Perl and while it's pretty popular, it's not as popular as Java. Partly that's due to Perl being totally interpreted versus half-ways compiled (ie, byte code), and let's face it, Perl isn't really a full-fledged programming language.
The other thing that's perhaps a side effect from this paradigm is that Java provides mostly least-common-denominator system services. For instance, someone was asking me about checking for an already- running instance of a program on a Windows workstation. That's a really easy thing to implement in any language that can talk directly to the OS -- about 15 lines of code invoking EnumWindows.
You can't do that in Java, however, unless you go JNI. That's not necessarily a bad thing, however. You lose some OS-locale-based fine- tuned features, but on the flip side when you grab someone's class libs off a web site you have almost no work to do to use them, regardless of what OS you're using. That's a pretty compelling attraction for agile programming.
> There are some downsides to this; JAVA is generally slower than a language > like C++ because of the extra time needed by the Virtual Machine. With how > fast computers are today, most people do not care about the performance > penalty in their applications (the penalty is usually pretty small now). I would say "the penalty is usually pretty small now" is very context sensitive. Not to pick on you personally, but the general embracing of the "memory is cheap" or "performance is good enough" is popular in the Java community, and it's wrong to me. Maybe that's because I use everything from MASM to IBM 370 BASM to C, C++, Java, VB and Perl, not sure...
But I would never, or at least rarely, use Java for desktop utilities I create for myself because the startup time is dreadfully slow, and the impact on other applications is large. If you're running a system with a GIG or less of RAM, firing up a memory-hungry JVM isn't something you do hastily. Run something like Process Explorer (http:// www.microsoft.com/technet/sysinternals/ProcessesAndThreads/ ProcessExplorer.mspx) and look at the responsiveness of the UI. You can tune Java apps all day and you simply can't get that without ultra high-speed hardware.
I can start Java apps on my 3GHz Pentium and the drain on the system is highly noticable. I'll have periods where the system is completely unresponsive to mouse clicks, or the mouse pointer icon lumbers across the screen. Shut down those apps and the system returns to the standard Windows response (which is still barely tolerable, but at least it's better). I could probably do some tuning to counteract that, but that defeats the purpose of writing quick-n-dirty utilities to rapidly perform some function.
I think of Java programs like driving a Winnebago. For comfort and ease of travel it's hard to beat, especially on long trips where you want lots of room and comfort and not have to deal with meticulous details like where to set your full coffee cup or worrying about where to toss your trash -- most of these are taken care of you by the environment.
For quick trips around town, negotiating road space during rush hour, or for parallel parking on crowded city streets it's overkill and difficult to maneuver.
I always have felt the Java language is superlative, but the Java runtime model is gluttonous, and that two things would greatly help Java, IMO:
1) Optional override on storing data as unicode: why pay for something if you don't need it? Unicode is great if you need multinational support, but in 27 years of programming I've maybe needed that in two applications. Converting data back and forth when I don't need it is like paying for a cable TV subscription for your home when you are a traveling salesman who's on the road 7 days a week. Wasing computational energy degrades responsiveness of programs running in the JVM as well as programs running in the system outside of the JVM.
2) Optional override on garbage collection: If I don't want garbage collection, or I want to decide when it is going to run, then I want to control it. Garbage collection makes things "safe," but not all of us need that safety. I've written OS's from the ground up, so I'm totally comfortable in my memory mgmt skills. When I'm trying to streamline processing, spurious threads I didn't start are problematic and get in the way. I like having control when I program, and there are times I don't want GC and it'd be nice to shut if off.
These things degrade the Java footprint, IMO. Many purist Java-only programmers will say things like "memory is cheap" or "with the current speed of CPUs" etc. But that's a really bad philosophy and a cop out. It's insensitive to optimization, basically pretending response time and memory footprints are of little importance. Memory's cheap if you already bought it and it's installed right now -- it's neither cheap nor convenient if, for example, you just ran out in the middle of some long running process, or when the system's so lethargic that mouse clicks seem to be running across an RS-232 port at 4800 BAUD.
Software engineers don't have to get bogged down with making every nanosecond count, but we shouldn't ignore the fact that systems are never fast enough, never responsive enough, never have enough memory, never enough disk space, and that the faster and bigger the HW folks make'm, no matter how fast and big, the faster we bog'm down with OS's that require multiple DVDs to load and multiple GIGs of memory to work.
Java is a beautiful language with a rich, expressive syntax and a vast array of easy-to-use classes in the community. Those are a couple of big reasons its popularity has grown. The more SW engineers can do to shrink the performance gap between Java and natively-compiled languages like C/C++ and the more we take memory footprints and CPU cycles seriously, the better off Java will be.
Performance is the biggest drawback to using Java, maybe its only big drawback, and that's why I'm such a huge opponent against blanket statements that ignore this side of the Java tradeoff.
Again, this is more of a spew in all directions in an attempt to get all J-programmers thining, than a direct response to your note. Didn't want it to come across as a flame, cuz it's not.
Mark Thornton - 06 Feb 2007 22:41 GMT > For instance, someone was asking me about checking for an already- > running instance of a program on a Windows workstation. That's a > really easy thing to implement in any language that can talk directly > to the OS -- about 15 lines of code invoking EnumWindows. If we are talking about arbitrary applications then that only works for applications that actually have a window.
> You can't do that in Java, however, unless you go JNI. On the other hand if we are talking about testing for an instance of your own application (written in Java) then that is possible without resorting to JNI.
> I would say "the penalty is usually pretty small now" is very context > sensitive. Not to pick on you personally, but the general embracing > of the "memory is cheap" or "performance is good enough" is popular in > the Java community, and it's wrong to me. Maybe that's because I use > everything from MASM to IBM 370 BASM to C, C++, Java, VB and Perl, not > sure... Sometimes the penalty can be zero or even negative, particularly for applications which run long enough to eliminate the startup effects of JIT compiling.
> But I would never, or at least rarely, use Java for desktop utilities > I create for myself because the startup time is dreadfully slow, and > the impact on other applications is large. If you're running a system > with a GIG or less of RAM, firing up a memory-hungry JVM isn't Rubbish. You can do quite a lot of useful work with the JVM 'using' only 8MB or less which is trivial in the context of 1GB memory. I've been running a service, written in Java, which has no noticeable effect on the responsiveness of my normal applications. It sits there all day doing its stuff and I can easily forget that it is still running. My machine is a 4 year old 3.06GHz Pentium with 1GB of RAM.
> Performance is the biggest drawback to using Java, maybe its only big > drawback, and that's why I'm such a huge opponent against blanket > statements that ignore this side of the Java tradeoff. Look who is making blanket statements. I care a lot about performance, but I don't have a problem in this respect with Java. Java isn't perfect. For some applications it can be slower than say C++, but in other cases it can be just as fast (or even faster).
Mark Thornton
raddog58c - 07 Feb 2007 00:03 GMT On Feb 6, 4:41 pm, Mark Thornton <mark.p.thorn...@ntl-spam-world.com> wrote:
> > For instance, someone was asking me about checking for an already- > > running instance of a program on a Windows workstation. That's a [quoted text clipped - 3 lines] > If we are talking about arbitrary applications then that only works for > applications that actually have a window. Sure enuff, but point being the tradeoff between getting to the OS layer or stopping at the language layer. Java stops you at the implementation, unless you go JNI -- that prevents some things, but that prevention faciliates more portability. It's a tradeoff.
> > You can't do that in Java, however, unless you go JNI. > > On the other hand if we are talking about testing for an instance of > your own application (written in Java) then that is possible without > resorting to JNI. Absolutely. The point in this case was again the tradeoffs.
> > I would say "the penalty is usually pretty small now" is very context > > sensitive. Not to pick on you personally, but the general embracing [quoted text clipped - 6 lines] > applications which run long enough to eliminate the startup effects of > JIT compiling. How can it negative? I'm not saying you're wrong, but how can any byte-coded language outperform a binary language if they are doing the same thing? It can't, because you have to convert the byte code to the native binary stream before you can execute it. So I'm thinking you mean certain algorithms are more efficiently handled by the JVM? Please elucidate -- I heard someone say Java memory management now exceeds C and I thought it was an interesting notion and probably related to some ingenius optimizations in memory mgt algorithms, though I honestly don't know.
> Rubbish. You can do quite a lot of useful work with the JVM 'using' only > 8MB or less which is trivial in the context of 1GB memory. I've been > running a service, written in Java, which has no noticeable effect on > the responsiveness of my normal applications. It sits there all day > doing its stuff and I can easily forget that it is still running. My > machine is a 4 year old 3.06GHz Pentium with 1GB of RAM. No doubt, and Java's fine messaging implementation and rich set of protocol support, eg, makes it a good vehicle for such things. A service I'm fine with -- a utility I need to fire up over and over, not so fine with that. I wouldn't write that in Java.
At one time there was a Java compiler that let you go from Java to .EXE. I used it quite a bit, although the .EXEs it generated were pretty fat for the functionalit they implemented. Then again Java's not about creating .EXEs, so that didn't surprise me.
Languages have strengths and weaknesses. When it comes to my tools I want as close to subsecond response time as possible, so I'm looking for .EXE based apps. If I'm parsing huge chunks of random text, I'm all for Perl. If it's XML or a large messaging paradigm, Java's great.
> Look who is making blanket statements. I care a lot about performance, > but I don't have a problem in this respect with Java. Java isn't > perfect. For some applications it can be slower than say C++, but in > other cases it can be just as fast (or even faster). What would be an example where it's faster? I write a lot of Java these days, as well as a lot of C++ and even some C and ASM. I know a decent bit about Java best practices, but I'm not in Java 7x24 -- so if there's a way to make my Java scream, I'd like to know. At best I'm looking at avoiding fat objects with features I don't need, such as synchronized collections like Vectors where a single thread is using the object.
How does one get Java to run faster than a compiled language? It would seem there has to be a catch or specific circumstances for that to occur, because it's hard to fathom how that could be the case.
Thanks for the comments, by the way.
Chris Uppal - 07 Feb 2007 17:12 GMT > > Sometimes the penalty can be zero or even negative, particularly for > > applications which run long enough to eliminate the startup effects of [quoted text clipped - 4 lines] > same thing? It can't, because you have to convert the byte code to > the native binary stream before you can execute it. At least in theory, the JVM's JITer has more information available to it than a compiler producing a statically pre-compiled binary would have. Some theoretical examples:
Is the machine /actually/ a multiprocessor ? If not then some synchronisation primitives can be replaced by no-ops.
Is a (virtual) method /actually/ overridden by any class loaded at runtime ? If not then optimisations like static linking or even inlining become possible.
Does the processor have an extended instruction set ? If so then the JITer can generate code which uses those instructions. (A statically precompiled binary could include both sets of code, of course, with dynamic switching between them, but that is not often deemed worth the extra bother).
As far as I know, all of those possibilities are implemented (if only in limited ways) in current Sun JVMs.
> How does one get Java to run faster than a compiled language? Simple: compare it with a compiled language with a bad optimiser ;-)
FWIW, I think it is /highly/ application dependent, and there is no simple set of rules you can follow to make Java run as fast as possible. My impression is that the optimiser in the server JVM, from 1.5 (and presumably later) generates code which is comparable with GCC -o3 or MS's C++ compiler with all obvious optimisations turned on -- however that is a useless observation unless the code in the two languages is trying to do the same thing (e.g. 2-D arrays have different layouts in C and Java, or a calculation might create many intermediate objects in carefully-written Java whereas the "same" code in well-crafted C++ might not).
-- chris
raddog58c - 07 Feb 2007 19:19 GMT On Feb 7, 11:14 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE- THIS.org> wrote:
> > > Sometimes the penalty can be zero or even negative, particularly for > > > applications which run long enough to eliminate the startup effects of [quoted text clipped - 22 lines] > As far as I know, all of those possibilities are implemented (if only in > limited ways) in current Sun JVMs. These are good... thanks. It would depend on the nature of the application, as some of the optimizations, unless significant, wouldn't make up the difference in the time it took to compile the byte code into machine code.
> > How does one get Java to run faster than a compiled language? > > Simple: compare it with a compiled language with a bad optimiser ;-) That'd be one of several ways... touche!
> FWIW, I think it is /highly/ application dependent, and there is no simple set > of rules you can follow to make Java run as fast as possible. My impression is [quoted text clipped - 5 lines] > intermediate objects in carefully-written Java whereas the "same" code in > well-crafted C++ might not). This does presume that we're comparing the post-compiled byte code against the precompiled code in the runtime binary (.EXE, .COM, etc). The fact the conversion is done at run time and would have to be done every time the code is run (unless it's cached) puts it at a disadvantage out of the gate. The late binding to environment could help close the gap, but that's not guaranteed because the .EXE can be precompiled for the target deployment environment and if so the race is over.
I'm nonetheless impressed with the computer scientists building optimizations into the JVM -- they've a lot of clever techniques up their respective sleeves.
> -- chris Mark Thornton - 07 Feb 2007 20:07 GMT > On Feb 7, 11:14 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE- > THIS.org> wrote: > disadvantage out of the gate. The late binding to environment could > help close the gap, but that's not guaranteed because the .EXE can be > precompiled for the target deployment environment and if so the race > is over. The single/multi processor state can be changed after an application has been installed. A JVM will adjust accordingly, but happens to your EXE that was selected/compiled for the single processor that existed at install time? There used to be an issue with maths coprocessors (and may be again if AMDs ideas surface in a product). I think it is still possible for processor upgrades to add SSE3 or similar capability.
Mark Thornton
Chris Uppal - 08 Feb 2007 13:33 GMT > A JVM will adjust accordingly, but happens to your EXE > that was selected/compiled for the single processor that existed at > install time? Which raises the question of how much extra testing is needed to account for the fact that the JITer will adapt to the host processor and thus be executing different programs on different machines ?
Personally, I think the answer is "damn little" -- except in special cases. The JVM people have shown themselves to be good at producing implementations which behave the same despite the adaptive behaviour (except considerations of speed, naturally ;-) so I'd put my testing budget into looking for the mistakes /we/ make in preference to the ones that Sun's VM engineers make.
The one qualification I'd make to that is that the testing machines shouldn't be such that they are likely to /mask/ problems -- so they should be multiprocessor boxes, and with no more "grunt" than the worst-case target machine. (Though, I suppose a case could be made that at least some testing should be done on a machine with as near as possible the same grunt as the fastest/biggest target machine during the release's anticipated lifetime.)
-- chris
senior - 12 Feb 2007 13:58 GMT the most advantage for java is in networks or distributing computation
and please googling about advantage of java to see why it is popular then welcome to any question
raddog58c - 10 Feb 2007 19:42 GMT > The single/multi processor state can be changed after an application has > been installed. A JVM will adjust accordingly, but happens to your EXE > that was selected/compiled for the single processor that existed at > install time? Well, I'm recompiling it for the new target environment while they're swapping the gear on the production box.
The entire argument around auto-configuration of runtime code is valid, but it assumes you'll be changing the underlying platform over the lifetime of the code. While that can and does happen in some environments, sometimes frequently, there are many envs in which that's not the case, or change is infrequent enough that it's trivial to regenerate the runtime for the appropriate target if you need the fastest speed you can get.
And just as the JVM can autoadapt, a programmer could build different versions. Obviously it's not a side effect of the environment when you're spinning it yourself, but just to be fair regarding all available solutions.... startup logic could perform the same set of checks to find out what hardware is installed on the host, and the startup overlord can manipulate the runtime in some manner (rename the binaries, change startup link pointers, etc) to facilitate the best-of- breed for that environment accordingly.
The difference, obviously, is that with Java the programmer doesn't have to concern her or himself with these mundane details -- all java programs will inherently obtain this effect based on the JVM's capacity to provide it. The skilled engineer can easily build a library of platform checking and invoke or autoinstall the right code when the changed hardware is detected. If that dynamicism were needed based on the problem space, it's a reasonably easy feature to build. "Back in the day" when controllers were really dumb, we used to write streaming tapes with interrecord gaps that were the minimum size possible and still provided ample time to perform inter-record processing (programming DMA controllers, generating CRCs, etc) on the slowest processors on the market at the time. It was always interesting, as we had to run a sequence of instructions at startup to see how fast they'd execute -- the faster they ran, the longer we needed to configure our wait/delay loops in the inter-record processing routines. That stuff was all based on the main clock, since "smart" controllers weren't available to most devices at the time, so drivers ran on the host's CPU.
Anyways, I'm not poopoo'ing the dynamic feature of Java -- it's very cool and has its place to be sure. By the same token, HW engineers may eventually standardize CPUs and aux processors, like FP processors, to render the differences insignificant. And it's like a lot of things provided by Java's dynamicism: they're awesome when you need'm, pointless and wasteful when you don't.
More often than not I don't require the dynamicism, so it alone is not a big selling point for me. Your mileage may vary, and that's cool, it's just for me Java's dynamics are not why I choose it -- as I mentioned, for me the overhead is a turn off. Heck, I'm not even attracted by the fact the programmer is freed from the need to manage her or his memory space -- I don't personally find memory mgmt terribly challenging work.
What sells me on Java is the breadth of elegant frameworks, the general (not always) ease of adding new functionality by adopting classes into your app, and the completenes of many classes out and about.
For example, we use Spring framework at the office, and it's a really fabulous way to construct a distributed network of components as services, and it does certain things to greatly reduce the "noise" in code such as using "injection" to clean up constructors. Pretty compelling stuff, and really a much bigger selling point, IMO, than running under the auspices of a JVM.
Lew - 11 Feb 2007 06:18 GMT > Anyways, I'm not poopoo'ing the dynamic feature ... "Pooh-poohing". "Pooh-pooh" is to express derision. "Poopoo" is a euphemism for animal excrement.
- Lew
nukleus - 11 Feb 2007 12:29 GMT >> Anyways, I'm not poopoo'ing the dynamic feature ... > >"Pooh-poohing". "Pooh-pooh" is to express derision. "Poopoo" is a euphemism >for animal excrement. What a gem.
I wish I could have you sitting on my bookshelf. If only I could push a button and say:
Lew, tell me, what is THIS thingy here?
And, sure enough, there is ALWAYS an answer.
Have you thought of Virtual Lew project?
Could be quite a thing...
Do you want me to quick prototype that thing to give ya an idea of what can be done?
>- Lew Lew - 11 Feb 2007 18:13 GMT raddog58c wrote:
>>> Anyways, I'm not poopoo'ing the dynamic feature ... Lew wrote:
>> "Pooh-poohing". "Pooh-pooh" is to express derision. "Poopoo" is a euphemism >> for animal excrement.
> What a gem. > [quoted text clipped - 11 lines] > Do you want me to quick prototype that thing > to give ya an idea of what can be done? Great idea.
- Lew
nukleus - 12 Feb 2007 15:38 GMT >raddog58c wrote: >>>> Anyways, I'm not poopoo'ing the dynamic feature ... [quoted text clipped - 21 lines] > >Great idea. Well. I didn't expect youre gonna buy it. But...
Oki, doki. Just warn everybody here that some things may...
Well, just relax if my monkey gets too wild. You might see zome posts of yerr royal highness.
Lets narrow it down. Which subject would you pick to reflect the "Best of Lew"?
Your turn now.
>- Lew raddog58c - 11 Feb 2007 19:11 GMT > > Anyways, I'm not poopoo'ing the dynamic feature ... > > "Pooh-poohing". "Pooh-pooh" is to express derision. "Poopoo" is a euphemism > for animal excrement. > > - Lew <<"Pooh-poohing". "Pooh-pooh" is to express derision. "Poopoo" is a euphemism for animal excrement.
- Lew >>
Thank you, and as the radicaldog that I am, I restate that I was not poopooing Java. ;-)
Chris Uppal - 11 Feb 2007 15:55 GMT > And just as the JVM can autoadapt, a programmer could build different > versions. Obviously it's not a side effect of the environment when [quoted text clipped - 4 lines] > binaries, change startup link pointers, etc) to facilitate the best-of- > breed for that environment accordingly. That approach runs into the problem of combinatorial explostion -- which is why it is only used in limited ways and in rather extreme cases. The thing is that a JIT has more information available to it than any possible static analysis. That is a /fundamental/ advantage, and cannot be clawed back (though it can be wasted); just as having to do extra work at runtime is a fundamental /disadvantage/ which can only be compensated for, but never eliminated.
BTW, I'd not advocating one approach over the other here, just discussing what the approach taken by current JVM's /is/.
-- chris
raddog58c - 11 Feb 2007 19:53 GMT On Feb 11, 9:55 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE- THIS.org> wrote:
> > And just as the JVM can autoadapt, a programmer could build different > > versions. Obviously it's not a side effect of the environment when [quoted text clipped - 16 lines] > > -- chris Excellent points.
Arne Vajhøj - 09 Feb 2007 02:55 GMT > On Feb 7, 11:14 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE- > THIS.org> wrote: [quoted text clipped - 9 lines] > wouldn't make up the difference in the time it took to compile the > byte code into machine code. Yes - unless significant.
> This does presume that we're comparing the post-compiled byte code > against the precompiled code in the runtime binary (.EXE, .COM, etc). > The fact the conversion is done at run time and would have to be done > every time the code is run (unless it's cached) puts it at a > disadvantage out of the gate. The late binding to environment could > help close the gap, Or it could cover the gap 200%.
You are basically proving that Java is not efficient by assuming so.
Meaning you proved nothing.
Arne
raddog58c - 11 Feb 2007 00:09 GMT > Or it could cover the gap 200%. > [quoted text clipped - 3 lines] > > Arne I wasn't trying to prove anything. Common sense says if you have to convert from one format to another before you begin executing, you have an extra step and obviously all things being equal you are not as efficient, period, end of sentence.
Converted code that's more efficient you could make up for the conversion - it would depend on the problem space, run duration, and how well/badly each program were written. The converted code would need to be more efficient to have a chance to make up for the extra step. If it were equal to or less efficient, you will not make up the gap.
That's not a proof -- it's an observation of reality, right?
Well written code in a language like C optimally compiled for the native environment is going to be tough to beat unless you write in native assembler langauge. I have actually had to write in native assembler on more than one occasion in real-time systems where nanoseconds mattered. That's atypical, but these situations do exist. Anyone suggesting an interpreted language is the way to go in these environments either has no understanding of the problem space, or they've got a lot of explaining to do to make that assertion stick.
At any rate, the bottom line is you can't execute 100 instructions faster than 50 instructions if they're running on average the same number of clock cycles -- until someone invents a CPU with an ALU that executes intermediate code directly, cycles have to be used by the interpreter to transform to the native binary.
I used to program in Lisp years ago it had similiar issues. Don't hold me to it, but if I remember right there was a movement by Lisp aficionados to create a CPU that did exactly that: executed Lisp directly. These machines obviously never gained a lot of popularity.
There's a beauty in interpreted languages, but instruction-level efficiency is a trade off you make for the functionality and late- binding paradigm that interpretation provides. I give the JVM architects a lot of credit, as under the right circumstances they glean a lot of efficiency out of Java byte code -- but on balance the Java code I've worked with in batch, GUI, and web server applications has not been impressive from a speed standpoint. Functionality wise it's great, however, so that's the emphasis upon which one should focus with respect to interp. languages, IMHO.
Lew - 11 Feb 2007 06:22 GMT > Well written code in a language like C optimally compiled for the > native environment is going to be tough to beat unless you write in [quoted text clipped - 4 lines] > these environments either has no understanding of the problem space, > or they've got a lot of explaining to do to make that assertion stick. Others have pointed out that the JIT compiler can beat compile-time optimizations in some cases by virtue of having a different view of the situation. Escape analysis, for example, is a runtime phenomenon.
Because JVM optimizations are global and dynamic, whereas compile-time optimizations are more local and static, the JVM may actually achieve significantly better performance because it follows different analysis paths. This could, and according to what I've read, does achieve far better performance than "C optimally compiled" code.
Myths are supported by misguided intuition. Reality often moves in surprising ways.
- Lew
raddog58c - 11 Feb 2007 19:14 GMT > > Well written code in a language like C optimally compiled for the > > native environment is going to be tough to beat unless you write in [quoted text clipped - 19 lines] > > - Lew Fair enough, but the key words are "may" and "could" -- often they will not, particularly in situations where there really is no alternative improvement. In that case the additional overhead to store, manage and reference a knowledge base "may" and "could" further deplete resources and degrade the overall system's throughput.
It's really 100% context sensitive.
Arne Vajhøj - 11 Feb 2007 20:23 GMT >> Because JVM optimizations are global and dynamic, whereas compile-time >> optimizations are more local and static, the JVM may actually achieve >> significantly better performance because it follows different analysis paths. >> This could, and according to what I've read, does achieve far better >> performance than "C optimally compiled" code.
> Fair enough, but the key words are "may" and "could" -- often they > will not, We are impressed by your argumentation technique.
You are basically arguing that "Java is slow because it is slow".
Arne
Chris Uppal - 11 Feb 2007 21:38 GMT > > > Because JVM optimizations are global and dynamic, whereas compile-time > > > optimizations are more local and static, the JVM may actually achieve [quoted text clipped - 8 lines] > > You are basically arguing that "Java is slow because it is slow". I don't think he is, you know.
It is legitimate to ask, in response to a claim[*] that <such and such> a technique /can/ have an advantage, how often, and by how much, that advantage manifests in practice.
Scepticism != blind prejudice.
-- chris
[*] or even proof.
Arne Vajhøj - 11 Feb 2007 22:25 GMT >> We are impressed by your argumentation technique. >> >> You are basically arguing that "Java is slow because it is slow". > > I don't think he is, you know. He is arguing that Java performance is less than C/C++ performance based on an assumption that the JIT compilation runtime overhead is bigger than the JIT over AOT gain.
To me that is basing the conclusion on an assumption that is equivalent to the conclusion.
Arne
Christian - 11 Feb 2007 23:21 GMT Arne Vajhøj schrieb:
>>> We are impressed by your argumentation technique. >>> [quoted text clipped - 10 lines] > > Arne Then what makes the main difference in speed? Is it rather such nice features like checking for ArrayOutOfBound that java is a bit slower?
I once implemented a cryptograhic hashfunction .. and I couldn't come closer to the speed of a c++ implementation than a factor of about 0.5.
since its purely deerministic I would assume a hashfunction can be very well optmized at compile time in case of c++ and JIT and AOT should not matter much to the program?
- Christian
Arne Vajhøj - 11 Feb 2007 23:38 GMT > Then what makes the main difference in speed? Is it rather such nice > features like checking for ArrayOutOfBound that java is a bit slower? The point is that Java is not always a bit slower. Sometimes it is a bit slower, sometimes it is a lot slower, sometimes it is faster.
> I once implemented a cryptograhic hashfunction .. and I couldn't come > closer to the speed of a c++ implementation than a factor of about 0.5. > > since its purely deerministic I would assume a hashfunction can be very > well optmized at compile time in case of c++ and JIT and AOT should not > matter much to the program? Difficult to say without seeing the code.
0.5 sounds awfully high, but it could be.
It also depends on the Java version being used and whether run in client or server VM.
Arne
John W. Kennedy - 12 Feb 2007 03:46 GMT > Arne Vajhøj schrieb: >>>> We are impressed by your argumentation technique. [quoted text clipped - 19 lines] > well optmized at compile time in case of c++ and JIT and AOT should not > matter much to the program? And, on the other hand, the authors of "Core Java" report that a Java implementation of the sieve of Eratosthenes using the Java Bitset class regularly outperforms an equivalent C++ program using the C++ Bitset template, even when they wrote a new Bitset template.
 Signature John W. Kennedy "The blind rulers of Logres Nourished the land on a fallacy of rational virtue." -- Charles Williams. "Taliessin through Logres: Prelude"
raddog58c - 12 Feb 2007 04:09 GMT > > Arne Vajhøj schrieb: > >>>> We are impressed by your argumentation technique. [quoted text clipped - 30 lines] > Nourished the land on a fallacy of rational virtue." > -- Charles Williams. "Taliessin through Logres: Prelude" That's actually very interesting... why do you think that's the case? I'm guessing the C++ Bitset template is grossly inefficient. Never used it so I don't know. It's intriguing -- I plan to run some benchmarks this week to compare ASM, C, C++ and Java -- time will dictate how extensive I go, and I'll keep the tests simple and as equivalent as possible. I'll post the results to this thread and we can see it looks like.
Thanks for sharing that.
John W. Kennedy - 12 Feb 2007 19:20 GMT >> And, on the other hand, the authors of "Core Java" report that a Java >> implementation of the sieve of Eratosthenes using the Java Bitset class >> regularly outperforms an equivalent C++ program using the C++ Bitset >> template, even when they wrote a new Bitset template.
> That's actually very interesting... why do you think that's the case? > I'm guessing the C++ Bitset template is grossly inefficient. That's what they thought, so they wrote their own. It was better, but Java still outperformed it.
 Signature John W. Kennedy "The blind rulers of Logres Nourished the land on a fallacy of rational virtue." -- Charles Williams. "Taliessin through Logres: Prelude"
raddog58c - 13 Feb 2007 12:51 GMT > > That's actually very interesting... why do you think that's the case? > > I'm guessing the C++ Bitset template is grossly inefficient. [quoted text clipped - 3 lines] > > -- So do you suppose this was a case where late binding made the difference, or do you think Java's implementation of whatever is used by the aforementioned algorithm is significantly better?
I ask because I'm curious what other techniques might be used by the JVM beyond instruction set coercion (sp?). Does the JVM recognize the algorithm or some facet of it and take shortcuts the C++ implementation doesn't, or does this algorithm use vast amounts of heap storage?
One thing that's different between C++ and Java is memory. I believe the JVM grabs everything it will ever need at startup, but I'm not 100% sure this would be the case in a C++ .EXE. There might be some memory dynamics in the form of quazi "lazy init" in C++ (ie, non- preallocated heap) that causes C++ to invoke the native OS's getmem API where the JVM does that upfront.
Again, don't know the algorithm, but if there's no instruction set advantages and no algorithmic pruning taking place, then my guess would be something along the lines of memory management. I do think the JVM manages memory better than most operating systems -- it would have to because the cost of keeping it clean via garbage collection would overly expensive.
Mark Thornton - 13 Feb 2007 14:07 GMT > One thing that's different between C++ and Java is memory. I believe > the JVM grabs everything it will ever need at startup It reserves the address space for the maximum heap size but only allocates memory for the minimum heap size. The reservation means it can be sure that the heap will be contiguous which gives performance advantages to the implementation. One downside to this is that memory reports by OS utilities can be incredibly confusing (i.e. useless).
Mark Thornton
raddog58c - 13 Feb 2007 15:43 GMT On Feb 13, 8:07 am, Mark Thornton <mark.p.thorn...@ntl-spam-world.com> wrote:
> > One thing that's different between C++ and Java is memory. I believe > > the JVM grabs everything it will ever need at startup [quoted text clipped - 6 lines] > > Mark Thornton Not really useless. Whatever the JVM has set aside is not available to processes which run under the auspices of the OS but not the JVM. Some JVM boosts are misleading because they turbo charge their applications but impact others running in the same storage space. That's a point that's missed by applications programmers in some performance comparisons -- it's not missed by systems programmers, however, because the OS is managaging all active processes, the JVM being nothing more than one of them.
Now if your machine runs purely for the sake of the JVM, like a webserver for instance, then I agree that the JVM's use of memory is misleading, because what it's done is establish the best-fit operating environment for itself, and that's what you want on something like a webserver.
Conversely, on something like your workstation where you'll (presumably) be using a mix of apps each running in their own process space, resource hungry processes and threads hurt system performance because they greedily suck up everything around them.
That's not bad -- it's how it works, and one needs to be consciously aware of the effects so they don't get burned by them.
Mark Thornton - 13 Feb 2007 15:55 GMT > Not really useless. Whatever the JVM has set aside is not available > to processes which run under the auspices of the OS but not the JVM. A reservation has no effect on other processes. It simply reserves a range of addresses within the JVM's process. It makes no difference to the amount of memory available to other processes. Note that what has been reserved is ADDRESS SPACE, which is not the same thing as memory. When the heap needs to expand, additional memory will be mapped into that reserved space.
Mark Thornton
raddog58c - 13 Feb 2007 16:28 GMT On Feb 13, 9:55 am, Mark Thornton <mark.p.thorn...@ntl-spam-world.com> wrote:
> > Not really useless. Whatever the JVM has set aside is not available > > to processes which run under the auspices of the OS but not the JVM. [quoted text clipped - 7 lines] > > Mark Thornton I used to do capacity planning and performance monitoring of our webservers. We used IBM's Websphere App Server and HTTP Server. One thing I noticed was CPU usage would fluctuate greatly, but memory usage was level. It seemed to me the JVM grabbed all the memory it was configured to start with at the time it started and never reliquished it, meaning it was not available to any other processes on the server.
Is there an option to have the JVM startup with a tiny heap and dynamically expand and shrink it if needed? I don't know one way or the other to be honest, but if there is we weren't using it in our environment. The JVM carved its space and never moved in one direction or another over its lifetime.
Chris Uppal - 13 Feb 2007 18:04 GMT > I used to do capacity planning and performance monitoring of our > webservers. We used IBM's Websphere App Server and HTTP Server. One [quoted text clipped - 3 lines] > reliquished it, meaning it was not available to any other processes on > the server. It's quite possible that IBM's server-class JVM implementations use memory in a very different pattern from how Sun's desktop-class JVM implementations do. (Especially when you remember that both Sun and IBM are in the business of selling big iron ;-)
It makes (in my opinion) a lot of sense for a JVM running a server to grab actual memory at startup. That would make no sense at all for a JVM used for running desktop applications where (a) the demand is much less predictable, and (b) there is unlikely to be someone around with the skill and time to tune each application separately.
That said, I do think Sun's desktop implementations are less flexible in this respect than they should be.
-- chris
Mark Thornton - 13 Feb 2007 19:23 GMT > Is there an option to have the JVM startup with a tiny heap and > dynamically expand and shrink it if needed? I don't know one way or Yes. Although this depends on the JVM in question. For Sun's JVM the option -Xms is used to specify the minimum heap size. For example -Xms4m would give a 4MB minimum heap size. Similarly -Xmx specifies the maximum heap size. The minimum permitted heap size is 1MB. Within the min/max limits the heap will grow and shrink as required, although it doesn't always shrink as quickly as might be desirable.
It may be that your web server system configured the JVM with a high minimum value for the heap (possibly even equal to the maximum value). It is quite common to do this with server type applications --- e.g. you can specify a minimum value for SQL Server to grab.
Mark Thornton
Arne Vajhøj - 14 Feb 2007 01:46 GMT > I used to do capacity planning and performance monitoring of our > webservers. We used IBM's Websphere App Server and HTTP Server. One [quoted text clipped - 9 lines] > environment. The JVM carved its space and never moved in one direction > or another over its lifetime. Yes.
-Xms<size> Set initial Java heap size -Xmx<size> Set maximum Java heap size
But if the machine is dedicated to only run WAS they may likely have set both to about 3/4 of the total memory.
Arne
raddog58c - 15 Feb 2007 12:03 GMT > Yes. > [quoted text clipped - 5 lines] > > Arne Thank you for the parms, and yes, I'm certain you're correct as running WAS is the reason for those servers' existence.
Arne Vajhøj - 14 Feb 2007 01:44 GMT > So do you suppose this was a case where late binding made the > difference, or do you think Java's implementation of whatever is used > by the aforementioned algorithm is significantly better? My guess is that it is a case of where the C++ compiler were not able to effectively optimize a certain language construct.
Sometimes weird effects pop up.
Some people have found that:
x ^= true;
is faster than:
x = !x;
in Java.
WTF
> One thing that's different between C++ and Java is memory. I believe > the JVM grabs everything it will ever need at startup, Not true. It startup with a certain value and expand as needed up to the maximum allowed.
Arne
Chris Uppal - 12 Feb 2007 16:50 GMT > I once implemented a cryptograhic hashfunction .. and I couldn't come > closer to the speed of a c++ implementation than a factor of about 0.5. > > since its purely deerministic I would assume a hashfunction can be very > well optmized at compile time in case of c++ and JIT and AOT should not > matter much to the program? That's what I would have expected too.
Assuming that your code was transforming binary data (byte[] arrays) into more binary data, and that your code wasn't dependent on the speed, or lack of it, of utility code (such as, perhaps, java.math.BigInteger), then I don't really see why there should be much difference.
<wild guess> Did your code make much use of 2-dimensional arrays (or higher) ? That's the only thing I can think of where similar-looking code in C and Java would actually be doing something different. </wild guess>
Bounds checking might make some difference, but I can't imagine it making a 2x difference. (The impact of checking depends on the code, and on how good a job the JIT and/or the processor's pipeline can do).
-- chris
Arne Vajhøj - 13 Feb 2007 02:26 GMT >> I once implemented a cryptograhic hashfunction .. and I couldn't come >> closer to the speed of a c++ implementation than a factor of about 0.5. [quoted text clipped - 4 lines] > > That's what I would have expected too.
> <wild guess> > Did your code make much use of 2-dimensional arrays (or higher) ? That's the > only thing I can think of where similar-looking code in C and Java would > actually be doing something different. > </wild guess> I have seen huge differences for Java just between running with -client and -server.
Arne
raddog58c - 13 Feb 2007 12:54 GMT > I have seen huge differences for Java just between running with > -client and -server. > > Arne Which has seemed to be better, and do you understand what is making the difference?
I do wish there were better built-in JVM hooks for extracting what's going on inside.... aren't "they" (JVM peeps) standardizing on some such things? It would sure help, because "unit" tuning is fine if you're wearing blinders, but if you're not careful you can make one thing run like a bat out of hades and in turn impact everything else that lives in the same space.
Tuning and better JVM diags/stats is an area of great opportunity, IMHO.
Mark Jeffcoat - 13 Feb 2007 14:28 GMT >> I have seen huge differences for Java just between running with >> -client and -server. [quoted text clipped - 13 lines] > Tuning and better JVM diags/stats is an area of great opportunity, > IMHO. You may be looking for the JVMTI--the "JVM Tool Interface".
It's not something I've looked into, since the simplest of profiling has so far sufficed for me, but it offers an extremely detailed look at what's going on under the hood.
http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html
 Signature Mark Jeffcoat Austin, TX
Arne Vajhøj - 14 Feb 2007 00:43 GMT >> I have seen huge differences for Java just between running with >> -client and -server. > > Which has seemed to be better, and do you understand what is making > the difference? -server is optimizing better than -client at the cost of more time spend JIT compiling.
Arne
raddog58c - 12 Feb 2007 03:38 GMT > >> We are impressed by your argumentation technique. > [quoted text clipped - 5 lines] > performance based on an assumption that the JIT compilation > runtime overhead is bigger than the JIT over AOT gain. No, I'm assuming it takes longer to translate a thing and run it, than it does to skip translation and just run it. I think that's a fair assumption.
BTW, this is not a C/C++ vs Java competition. I'd write something in native assembler taking advantage of every facet of the hardware if that's what I needed to do. The fact is I code in Java, C, C++, Perl and a smidgeon of other things (COBOL, Shell script, and visual basic to name a few) these days. I like them all for certain things, and they each have drawbacks for other things.
Just wanted to make sure I was clear on where I stand.
> To me that is basing the conclusion on an assumption that > is equivalent to the conclusion. > > Arne If you want to be 100% accurate, what I said was if you have to translate code to run it, then eliminating the translation step would cause the code to run faster.
Java has to be translated before you run it.
I believe these two statements are true. You can ding me and say you translate Java once and then it can be cached or wathever, and that's fine, but at load time it has to be translated. Byte code != native instruction set.
An assertion was made that late binding can take advantages of the environment to supply more efficient instructions. That's a theory. What if the non-translated program is alread optimized for the environment? Does someone want to suggest a JVM can translate a program and then perform some magic to make the ALU run faster? Of course not. Late binding only helps in circumstances where there's an opportunity to improve by substituting a more efficient set of instructions. As long as we're comparing apples to oranges then sure, we can say there's an opportunity for a difference. If we're comparing apples to apples, adding a translation step is going to be slower.
Let me put it another way.
Java class abc is invoked for the first time -- we fire up and load the JVM, abc gets translated to native code and executed -- Java class abc is invoked again, only this time it's sitting in cache and is ready to go so it's executed -- which instance of abc's execution ran faster? I'm claiming the 2nd -- does someone want to argue the 1st execution is faster? That firing up the JVM, translating the program, then executing it is faster than just executing it, or just reading it off disk and executing it?
This is what I'm saying. If these were 2 distinct programs, the 2nd should win every time (ignoring interrupts, system activity, etc) because we're comparing apples to apples.
FWIW, some of the late-binding advantage can be simulated with reasonably simple code in any language. For instance an installation script could check installed hardware and deploy the best from several versions, or I could build a run-time hardware checker/loader to pull code from a different DLL based on environments. I have created code to do these kinds of things. It's not out of the box functionality, but it's not very difficult either.
Also, a JVM can pay attention to usage, provide caching, adapt at run time, etc., but such a service requires space and processing cycles to supply. That means *potentially* it could make a single processing unit it serves run faster or maybe not, but the service's computations come at the expense of the operating environment as a whole. Most comparisons focus on a single program's execution, but the resources consumed by the JVM to look for runtime optimizations are resources unavailable to other processes in the operating environment. How much I don't know, but computational knowledge requires memory to store and processor cycles to access. I do know large Java apps running on my workstation will "blackout" (eg become unresponsive to mouse clicks), and I don't experience the same from my C, C++ or Perl scripts -- I presumed garbage collection was running, but maybe it's the JVM looking for ways to run its apps and classes faster. 8-)
One other thing that I believe hurts Java's runtime environment is the built-in unicode support. I have no use for unicode in my applications, but Java stores everything as unicode, right? So that means twice as much memory, and the need to convert from ASCII or EBCDIC into and out of unicode, over and over again, and for nothing. Is there are really good reason for mandatory unicode support? This hurts Java's runtime model, IMO, and it's pointless if you don't need it.
Anyway, this is a really interesting discussion to me. I think the fact there's a discussion at all is a very big compliment to the software engineers constructing the JVMs. They're obviously good.
Mark Thornton - 12 Feb 2007 13:07 GMT > reasonably simple code in any language. For instance an installation > script could check installed hardware and deploy the best from several > versions, or I could build a run-time hardware checker/loader to pull > code from a different DLL based on environments. I have created code > to do these kinds of things. It's not out of the box functionality, > but it's not very difficult either. How difficult this is depends on how many variations you need to test. For example SSE 1,2,3, AMD; number of processors (1 or >1). So 4 or 5 different levels of accelerated math, plus the number of processors, you could be looking at 10 different DLLs for the same function. This gets very tedious. The usual response is simply to insist on say SSE2.
> One other thing that I believe hurts Java's runtime environment is the > built-in unicode support. I have no use for unicode in my > applications, but Java stores everything as unicode, right? So that > means twice as much memory, and the need to convert from ASCII or > EBCDIC into and out of unicode, over and over again, and for nothing. > Is there are really good reason for mandatory unicode support? Not having to duplicate all the methods which take String parameters. What encoding would you assume for 'narrow' character strings?
Mark Thornton
raddog58c - 13 Feb 2007 11:41 GMT On Feb 12, 7:07 am, Mark Thornton <mark.p.thorn...@ntl-spam-world.com> wrote:
> > reasonably simple code in any language. For instance an installation > > script could check installed hardware and deploy the best from several [quoted text clipped - 20 lines] > > Mark Thornton I'm thinking I wouldn't have to worry because the JVM would handle, right? This would be an ideal run-time derived feature for the JVM to provide -- NOP unnecessary translations based on the mode of operation.
If I needed to know, a System.Getyadda call should obtain it.
If I was building a rather large application for in-house use only, I wouldn't really care because I'd always be using the same format. Today I don't spend an inordinate amount of time converting outside of persisting data. I'm just saying it should improve performance if you can eliminate unnecessary transformations.
Mark Thornton - 13 Feb 2007 12:01 GMT >>Not having to duplicate all the methods which take String parameters. >>What encoding would you assume for 'narrow' character strings? [quoted text clipped - 5 lines] > provide -- NOP unnecessary translations based on the mode of > operation. Using Unicode whether encoded as UTF-16 or UTF-8 eliminates a whole world of pain the moment you see a character outside your current character set. If you never see anything outside traditional ASCII then you may not appreciate this. I'd be very happy to see all the traditional character sets disappear (CP-437, etc), leaving only the Unicode encodings. I've had data sent to me without any declaration of the character set in use and had to guess on the basis of the words contained. One case I never did figure out --- it had probably been mangled by other software that didn't understand the character set.
A simple example here with traditional software is what happens to our currency symbol (£, pounds sterling) if you mix up code pages.
Earlier you mentioned the doubling of space caused by Unicode (assuming UTF-16). This is only valid if most of your memory was taken up by text. The only applications where this is likely to be true (word processors and the like), ought to be capable of handling a wider range of characters than ASCII. Even writing in English, I want a generous range of mathematical symbols available (I am a mathematician).
Mark Thornton
raddog58c - 13 Feb 2007 13:00 GMT > Using Unicode whether encoded as UTF-16 or UTF-8 eliminates a whole > world of pain the moment you see a character outside your current [quoted text clipped - 17 lines] > > Mark Thornton That's completely valid....
...if you need it. It's not valuable if you don't.
I pay a lot of money for cable and HBO television which would be a waste if I didn't own a TV.
Also, while it is a pain to convert when you have to, it's just as much of a pain to convert when you don't feel you need to -- getChars() is a pain because I don't need anything by chars 99% of the time in the particular code I'm writing.
Your mileage may vary, and that's where having an option suits both needs.
Oliver Wong - 14 Feb 2007 21:01 GMT [post re-ordered to group similar-topic paragraphs together]
[Mark wrote:]
> > Using Unicode whether encoded as UTF-16 or UTF-8 eliminates a whole > > world of pain the moment you see a character outside your current > > character set. If you never see anything outside traditional ASCII then > > you may not appreciate this. [...]
> > Even writing in English, I want a generous range > > of mathematical symbols available (I am a mathematician). [quoted text clipped - 5 lines] > I pay a lot of money for cable and HBO television which would be a > waste if I didn't own a TV. [...]
> Your mileage may vary, and that's where having an option suits both > needs. Not being able to input non-ASCII unicode characters in the textfield of an application seems as archaic to me as not being able to enter "0" in a numeric field, because early (e.g. bronze age) arithmetic systems did not know about the concept of zero as a number.
It's hard to believe that today, there are still mp3 players that are unable to handle ID3 tags for songs with non-English characters in them.
> Also, while it is a pain to convert when you have to, it's just as > much of a pain to convert when you don't feel you need to -- > getChars() is a pain because I don't need anything by chars 99% of the > time in the particular code I'm writing. I don't understand your complaint: If you don't need getChars(), then don't invoke it. What's the problem?
- Oliver
raddog58c - 15 Feb 2007 01:01 GMT > I don't understand your complaint: If you don't need getChars(), then > don't invoke it. What's the problem? > > - Oliver The data is stored in UNICODE whether you require it or not. I'm not writing multinational code at this juncture. In 25+ years of programming the number of times I've needed multinational character sets can be counted on one had with fingers to spare.
You might find it archaic, but I find it wasteful. It's a waste converting into and out of a format you never use.
Why don't you convert your data into Russian characterset. Since you're never communicating in Russian, when you need English, swap back. What's the big deal?
The big deal is why do that? Nobody would do that if there wasn't a reason.
That's what I'm saying. It's conversion to a format that I'm not personally using. Some people need it; some don't; yet we all pay for it.
Mark Thornton - 15 Feb 2007 09:03 GMT > Why don't you convert your data into Russian characterset. Since > you're never communicating in Russian, when you need English, swap > back. What's the big deal? Some of my data consists of European place names, each in the relevant language. That means I need all the characters in every European language in the same data set.
I often holiday in a part of Italy where many of the place names are given in two languages.
Some years ago I was at a party in Brussels where at least 6 languages were being spoken, sometimes two languages in a single sentence.
The idea that you can Balkanize the data into nice separate compartments with their own character set just doesn't work (not least in the Balkans hee hee).
Mark Thornton
raddog58c - 15 Feb 2007 12:05 GMT On Feb 15, 3:03 am, Mark Thornton <mark.p.thorn...@ntl-spam-world.com> wrote:
> > Why don't you convert your data into Russian characterset. Since > > you're never communicating in Russian, when you need English, swap [quoted text clipped - 15 lines] > > Mark Thornton Now for you it's an entirely different story, yes? If I had these requirements my feelings about UNICODE would be more in line with yours. Since I don't the conversion is a wasted expense.
Mark Thornton - 15 Feb 2007 13:46 GMT > Now for you it's an entirely different story, yes? If I had these > requirements my feelings about UNICODE would be more in line with > yours. Since I don't the conversion is a wasted expense. Java's compulsory use of Unicode means that any third party tools I use will also work with data like mine. In systems where use of unicode is optional (or non-existent) it is common to find tools that would be nice if only they mad provision for characters beyond ASCII. It also simplifies internationalization even where the original developer didn't pay too much attention to these requirements.
Character set conversion can only be avoided if you can be sure of always working in a single specified character set. Once you have to do a conversion of some sort, conversion to/from Unicode isn't much (if any) more expensive than conversion between simple single byte character sets. Even in the US I expect you are likely to see data in CP-437, CP-850, CP-1252, and ISO-8859-1 as well as UTF-8, and UTF-16.
Finally just how much difference does the extra overhead of Unicode actually make? In most substantial applications the overhead will be negligible.
Mark Thornton
Lew - 15 Feb 2007 15:18 GMT raddog58c wrote:
>> Now for you it's an entirely different story, yes? If I had these >> requirements my feelings about UNICODE would be more in line with >> yours. Since I don't the conversion is a wasted expense.
> Java's compulsory use of Unicode means that any third party tools I use > will also work with data like mine. In systems where use of unicode is [quoted text clipped - 13 lines] > actually make? In most substantial applications the overhead will be > negligible. Java is what it is. It has its reasons to be that way. Not all decisions are optimal from all points of view. Some programmers don't need everything a language or an API offers. The language or API still has to offer it.
This is no different from any language. Considering the overall population that uses it, a language will always make compromises to satisfy the greatest portion of that population.
If you don't like Java then there are alternatives. For what it does, and considering Mark's points about the general usefulness of requiring Unicde and its nearly complete lack of impact on code efficiency, Java is extremely well suited.
If you wrote your own language without Unicode support, or with the "Balkanized" version of it, you'd probably find pretty quickly that the "all Unicode" approach offers significant advantages.
In the meantime, when we use Java we are stuck with all its warts as well as its advantages. Focusing on corner issues like "it's always Unicode" or "there are no closures" does not diminish the actual, real-world usefulness of the language. (And some of these ideas, if sufficiently universally beneficial, wind up in the language eventually anyway.)
- Lew
John W. Kennedy - 16 Feb 2007 04:53 GMT > Character set conversion can only be avoided if you can be sure of > always working in a single specified character set. Once you have to do > a conversion of some sort, conversion to/from Unicode isn't much (if > any) more expensive than conversion between simple single byte character > sets. Even in the US I expect you are likely to see data in CP-437, > CP-850, CP-1252, and ISO-8859-1 as well as UTF-8, and UTF-16. Don't forget IBM-037 (US EBCDIC) and, thanks to the Euro, ISO-8859-15.
 Signature John W. Kennedy "The blind rulers of Logres Nourished the land on a fallacy of rational virtue." -- Charles Williams. "Taliessin through Logres: Prelude"
raddog58c - 16 Feb 2007 19:11 GMT > > Character set conversion can only be avoided if you can be sure of > > always working in a single specified character set. Once you have to do [quoted text clipped - 6 lines] > > -- I recently dealt with IBM037 aka CP037 in a STRUTS app from hell for IBM's OnDemand product. I was unfamiliar with the CP support in the String class prior to this endeavor, and now that I've used it I'm less enthused. It's simple (more or less), but obscure and very infrequently used in this environment -- I had problems using it (turned out to be a missing charset.jar on my workstation) and when I tried to find someone familiar with codepage support I came up empty.
Is this commonly used by others? It was difficult finding understandable documentation on the WWW. I had assumed I was going to need to "add" support for it or something, but I couldn't find a clear explanation of how to do it.
The XLAT assembler instruction (translate) would have been an order of magnitude easier, but I couldn't use it in this context. I could have implemented it more easily via xlatChar = xlatTable[unxlatedChar].
In any event, after lots of perusing and debugging and looking at other people's workstations, I found that the missing charset.jar was my problem. The Sun java documentation said CP037 was supported as part of the Extended Encoding Set, so the UnsupportedEncodingExceptions really threw me.
Mark Thornton - 16 Feb 2007 19:33 GMT > In any event, after lots of perusing and debugging and looking at > other people's workstations, I found that the missing charset.jar was Outside the US installing a comprehensive set of character sets and locales is (I think) the default. Because some crazy people ;-) whinge about the extra space entailed when all they need is ASCII, the default US install leaves all of this out!
Mark Thornton
John W. Kennedy - 17 Feb 2007 01:58 GMT > I recently dealt with IBM037 aka CP037 in a STRUTS app from hell for > IBM's OnDemand product. I was unfamiliar with the CP support in the [quoted text clipped - 3 lines] > (turned out to be a missing charset.jar on my workstation) and when I > tried to find someone familiar with codepage support I came up empty. The Java standard only mandates US-ASCII (ISO646-US), ISO-8859-1 (ISO-LATIN-1), UTF-8, UTF-16BE, UTF-16LE, and UTF-16. The choice of what else to supply is the responsibility of the Java implementation. Sun Java for Windows supplies something like 150 in all, not counting aliases. If you were not using Sun Java for Windows (or, probably, Sun Java for Solaris), IBM037 might not have been included (although it would obviously be included in any implementation of Java for MVS).
> The XLAT assembler instruction (translate) would have been an order of > magnitude easier, but I couldn't use it in this context. I could have > implemented it more easily via xlatChar = xlatTable[unxlatedChar]. I suspect x86 Java does use XLAT, where possible.
I really don't see what's so hard about:
Charset cs037 = Charset.forName("ibm037"); ... String st = new String(bytearray, cs037); ... Byte[] newbytes = st.getBytes(cs037);
(You can even leave off creating the Charset variable, but that would mean looking the thing up at run time over and over again, which is obviously wasteful, not to mention that you have to code to handle an UnsupportedEncodingException, even if it's a SNOC.)
Reader input = new FileReader(filename, "ibm037");
is even simpler, where it applies.
 Signature John W. Kennedy "The blind rulers of Logres Nourished the land on a fallacy of rational virtue." -- Charles Williams. "Taliessin through Logres: Prelude"
Oliver Wong - 15 Feb 2007 15:35 GMT >> I don't understand your complaint: If you don't need getChars(), then >> don't invoke it. What's the problem? > > The data is stored in UNICODE whether you require it or not. Well, Unicode is not a storage encoding system, or anything like that. Unicode is primarily a mapping from characters (in the linguistic conceptual sense, not in the C/C++ data type sense) to numbers. And you can't directly store numbers in computers. You can store bitstreams, and thus you need an extra step to encode from numbers to bitstreams. There are many such encodings: ASCII, UTF-8, UTF-16, etc. some of them being lossy (e.g. ASCII).
> I'm not > writing multinational code at this juncture. In 25+ years of > programming the number of times I've needed multinational character > sets can be counted on one had with fingers to spare. Well, I don't know what kind of software you write, so I can't comment much on that. But consider how many people have requested that the developers of WinAmp (a once popular mp3 player) to support unicode characters, so that I WinAmp could probably display the names of my English, French, Russian, Japanese and Korean songs. They refused to do so, stating that 90% of the Internet is English (a figure I'm sure they just made up). There are several problems with this argument.
First of all, internet usage in Asia is huge. Gold farming (which essentially comes down to playing video games online for pay) is a 1 billion dollar business in Korea alone (http://arstechnica.com/news.ars/post/20061227-8503.html), and playing video games online is a tiny segment of the internet usage pie chart, compare to web browsing, e-mail or file sharing, for example. According to http://www.internetworldstats.com/stats2.htm, North America accounts for only 20% of the internet usage, and while Internet usage is growing at a rate of 100+% (i.e. doubling) over 7 years, Internet usage in the rest of the world is growing at a rate of 200+% (i.e. tripling) over 7 years. This last diagram really says it all: http://www.internetworldstats.com/stats.htm
Second of all, just because one is an English-only speaker doesn't mean one wouldn't benefit from the ability to display characters outside of ASCII but within Unicode. Another poster presented the example of being able to display mathematical symbols. I'll present an additional example of my mp3s again.
One of the ID3 tags for my mp3s contains what I believe to be russian characters. I'm not sure, because I don't actually speak Russian. The artist name can be viewed at http://en.wikipedia.org/wiki/T%C3%8B%D0%AFRA and it's very easy for an English speaker to recognize: It's a T, an E with two dots on top, a backwards R, a forwards R, and an A. And the prounciation "Terra" comes intuitively. But try to load an ID3 tag with this text via an ASCII-only mp3 player, and you'll only see gibberish.
See, I don't even speak Russian, and yet I benefit from my software being able to display Russian characters. That's why Unicode is more than just "supporting other countries' languages". It's about being able to represent text that you would normally find all around you in real life on your computer.
> You might find it archaic, but I find it wasteful. It's a waste > converting into and out of a format you never use. What formats do you think one is converting to and from? There are bits on the harddrive or RAM, and you need to somehow semantically treat these bits as if they represented text. From what I understand, in C, you actually manipulate these bits almost directly, and so an algorithm (e.g. testing whether a character is numeric) designed to work with ASCII will not work with EBCDIC and vice versa. In Java, things are a bit more high level: You *don't* work directly with bits. Instead, you work with characters. Theoretically, how these characters are represented in the JVM shouldn't matter to you (in practice, due to backwards compabitility reasons, it has "leaked out" that the internal representation is UTF-16-like). They might internally be stored as UTF-16, UTF-8, or some crazy undocumented internal format. It doesn't matter, because you shouldn't be manipulating the bits that represent those characters, you should be dealing with the characters directly. Any algorithm (e.g. testing whether a character is numeric) will work regardless of the encoding, because the actual encoding is (supposed to be) abstracted away.
Now if you have a String of characters in memory, and you want to store it on disk somehow, there are many encodings to do this, just like if you wanted to store a binary tree on disk somehow, there are many encodings to do this. *This* is where any "converting" might occur, though the term "converting" is misleading: "encoding" would be a better term. You can encode the text as ASCII, UTF-8, or some other format. And if you want to read the bitstream from disk and convert it back to text, a decoding stage occurs.
In C, there's no similar stage, because once again, there's no abstracting the encoding away from the text. If you want to replicate C's behaviour in Java, rather than reading in text, read in bytes. Then, you can manipulate the bytes in anyway you like, and if you think these bytes represent text, you'll have to guess at the encoding (ASCII? EBCDIC? UT
|
|