Java Forum / General / March 2006
Difference between String variable and String Class definition
vahid.xplod@gmail.com - 27 Mar 2006 22:41 GMT hi , can anyone tell me what is difference between String variable and String class definition. for example :
String variable ---> String = "java"; String Class ---> String = new String("java");
in two example above, i want to know that every two definition are same as each other about memory allocation or not ?
Duane Evenson - 28 Mar 2006 00:34 GMT On Mon, 27 Mar 2006 13:41:45 -0800, vahid.xplod wrote:
> hi , > can anyone tell me what is difference between String variable and [quoted text clipped - 6 lines] > in two example above, i want to know that every two definition are same > as each other about memory allocation or not ? The first line allocates memory for the literal "java". It then points the variable to this location. The second line allocates space for the literal. It then creates a new String class. It copies the string from the first place to the new place in memory. Finally, it points the variable to that location. All you do in the second example is create more work for the computer and use up more memory (until the next garbage collection).
Consider what String var - new String(new String(new String("java"))) would do. ... A series of memory allocations would occur with "java" being copied from one place to the next.
Patricia Shanahan - 28 Mar 2006 01:03 GMT ...
> Consider what > String var - new String(new String(new String("java"))) > would do. > ... > A series of memory allocations would occur with "java" being copied > from one place to the next. I've taken a look at the String source code, and in practice the actual character array containing 'j', 'a', 'v', 'a' will not be copied, but shared by the series of String objects, because the String needs the whole of the character array.
This is, of course, an implementation detail, not part of the interface.
Patricia
Roedy Green - 28 Mar 2006 05:20 GMT >It then creates a new >String class. you mean a new string Object.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Tom Fredriksen - 30 Mar 2006 00:05 GMT > On Mon, 27 Mar 2006 13:41:45 -0800, vahid.xplod wrote: >> [quoted text clipped - 9 lines] > String class. It copies the string from the first place to the new place > in memory. Finally, it points the variable to that location. Are you saying that the first example does not produce a String object with the data "java" as its content? I would think it does, doesn't it? So basically that means that the second example just creates an additional object which it uses to set up the first object. I would have thought its was syntactic sugar, nothing more. How one can be fooled.
But a question then is why the need for two seemingly similar statements, but which has different effects. I do conceive there is a reason, but is there a real need? generally I see many such things as just an attempt to be clever than a real need, I could be wrong though.
/tom
Eric Sosman - 30 Mar 2006 00:40 GMT Tom Fredriksen wrote On 03/29/06 18:05,:
>>On Mon, 27 Mar 2006 13:41:45 -0800, vahid.xplod wrote: >> [quoted text clipped - 12 lines] > Are you saying that the first example does not produce a String object > with the data "java" as its content? I would think it does, doesn't it? Up to this point, I think I understand your question and can answer it. The whole process goes something like this: The compiler sees the literal "java" in the source code, and generates a corresponding string constant in the class file. When the class gets loaded, the JVM takes that string constant and makes a String object out of it. The JVM also arranges to make just one String object per unique string constant value, so if you write "java" several times (perhaps in several different .java files), the JVM folds them all together into just one String object with the value "java". When the code is executed (well, the code as given won't even compile, but let's imagine that we've fixed it), it sets the value of a reference variable to point to this String object.
All this verbiage is in response to the word "produce," because it seems you're puzzled by where things come from. The important points: All identical literals get turned into a single String object by the JVM, and an assignment like `refvar = "java"' causes the reference variable to refer to that String object. You don't get a new String each time the assignment is executed; you just keep recyling the old one.
> So basically that means that the second example just creates an > additional object which it uses to set up the first object. I would have > thought its was syntactic sugar, nothing more. How one can be fooled. This part baffles me; I don't know what you mean.
> But a question then is why the need for two seemingly similar > statements, but which has different effects. Maybe similarity is in the eye of the beholder, but the two don't look very similar to me. One of them uses the `new' operator and passes an argument to a constructor, the other does not. `x = y' and `x = new X(y)' look different to me, and I'm not surprised they do different things.
> I do conceive there is a > reason, but is there a real need? generally I see many such things as > just an attempt to be clever than a real need, I could be wrong though. "Is there a real need" ... for what? There's certainly a need for the `new' operator, if that's the question. I suppose you could design an O-O language without constructors by using only factory methods, but I think it would be pretty clumsy, so perhaps constructors count as "needed," too.
The String(String) "copy constructor" doesn't seem to be very useful. It may have some use as a space optimization when extracting short substrings from long containing strings whose remains will be discarded, and it may have use in some esoteric circumstances where you are using String values as "tokens" that will never be == to each other even if their contents are identical. Maybe its principal use is as the basis of "Gotcha!" questions in Java exams ...
 Signature Eric.Sosman@sun.com
Tom Fredriksen - 30 Mar 2006 01:51 GMT >> Tom Fredriksen wrote On 03/29/06 18:05,: > [quoted text clipped - 11 lines] > > This part baffles me; I don't know what you mean. Sorry for being a bit unclear, what I mean is: Are both statements semantically correct, meaning are the first statement just a short form of the second. In other words, do they produce the same byte code?
Half of the answer seems to be already given, the second produces two string objects, while the first only produces one? So except for that they are then the same, or am I missing something here?
Lets assume I have got it right, then the question was why dont they both compile to the bytecode of the first statement? This was the point of seemingly similar, as in similar byte code, not syntax.
Hope that clear it up a bit.
/tom
Chris Uppal - 30 Mar 2006 10:40 GMT > Half of the answer seems to be already given, the second produces two > string objects, while the first only produces one? So except for that > they are then the same, or am I missing something here? They are not the same -- they are hardly even similar. The first assigns a reference to an /existing/ String object to a new variable. The second creates a new object and assigns a reference to that to the variable.
> Lets assume I have got it right, then the question was why dont they > both compile to the bytecode of the first statement? This was the point > of seemingly similar, as in similar byte code, not syntax. As I say, they are not similar. The compiler would be in error (/grossly/ in error!) if it generated the same bytecodes for the two statements.
Bytecodes:
/* String a = "Java"; */ ldc "Java" astore_1
/* String b = new String(); */ new java/lang/String dup invokespecial java/lang/String/<init> ()V astore_2
/* String c = new String("Java"); */ new java/lang/String dup ldc "Java" invokespecial java/lang/String/<init> (Ljava/lang/String;)V astore_3
(note that what I've rendered as "Java" in the above is actually a numerical reference into the constant pool).
As you see, the third case: String c = new String("Java"); is a minor variant on the second: String b = new String(); not on: String a = "Java";
-- chris
Tom Fredriksen - 30 Mar 2006 11:02 GMT >> Lets assume I have got it right, then the question was why dont they >> both compile to the bytecode of the first statement? This was the point [quoted text clipped - 9 lines] > not on: > String a = "Java"; I understand how it works now, but I don't understand why. The question that keeps popping into my head is, does there need to be a difference between the first example and the third?
I understand the mechanics and requirements from the language of how it should work when using new etc, but why can't it optimise it to be the same as example three? I.e why is there a need to have two different ways of doing the same thing, especially when they operate slightly different for, to me, no apparent reason?
/tom
Jussi Piitulainen - 30 Mar 2006 11:37 GMT > I understand the mechanics and requirements from the language of how > it should work when using new etc, but why can't it optimise it to > be the same as example three? I.e why is there a need to have two > different ways of doing the same thing, especially when they operate > slightly different for, to me, no apparent reason? They _don't_ do the same thing. Consider these:
("java" == "java") == true ("java" == new String("java")) == false ("java" == new String("java").intern()) == true (new String("java") == new String("java")) == false ...
A reason for _not_ interning every string that a program ever handles is that that would fill all the available memory, for no reason: the interned strings stay there. Most of the time you don't care either way. When you care, you can say which way you want it.
By the way:
... ("java" == (String) new String(String.valueOf(hello)) .intern() .toString()) == true
A reason to intern literals is so that people are not tempted to hand-intern their literals to save space. That would be awful.
Tom Fredriksen - 30 Mar 2006 12:33 GMT > A reason for _not_ interning every string that a program ever handles > is that that would fill all the available memory, for no reason: the > interned strings stay there. Most of the time you don't care either > way. When you care, you can say which way you want it. So, the effect of it would interning the string and that is the reason why you have both ways of doing it?
/tom
Chris Uppal - 30 Mar 2006 14:06 GMT > I understand the mechanics and requirements from the language of how it > should work when using new etc, but why can't it optimise it to be the > same as example three? I.e why is there a need to have two different > ways of doing the same thing, especially when they operate slightly > different for, to me, no apparent reason? Are you asking why: String v = "Java" isn't treated as if it said: String v = new String("Java") or the other way around ?
The reason that the first isn't treated like the second is that: a) It creates a new object unnecessarily. b) The second form needs String literals anyway, so we may as well use 'em directly.
If your question was the other way around, then the answer, in general, is why introduce a pointless special-case ? "new" means create a new object. Always. Everywhere. You wouldn't want to change that. If the programmer has asked for a new object, then presumably they /want/ a new object -- why should the compiler try to second-guess him/her ?
More specifically, as Jussi has said, it allows you control over whether or not a String is interned. (BTW, I have needed that level of control over interning in the past -- only once, I admit ;-)
Tom, I suspect that your problem here is that you haven't yet fully internalised the idea that, in Java, Strings are /objects/. You sound (to me) as if you are "trying" to think of them as values, and finding things strange (counter-intuitive) when that picture leads you astray.
Imagine we have a static called X. static final SomeClass X = new SomeClass(); I assume you wouldn't think there was any similarity between SomeClass y = X; and SomeClass y = new SomeClass(X);
The picture is essentially the same with Strings. One way to think of it is that each string literal is treated as if it were the name of a global variable which has been initialised to point to a String object with the corresponding contents. Multiple occurrences of "Java" will all be treated as if they were the name of the same global variable.
-- chris
Tom Fredriksen - 30 Mar 2006 15:46 GMT >> I understand the mechanics and requirements from the language of how it >> should work when using new etc, but why can't it optimise it to be the [quoted text clipped - 18 lines] > a new object, then presumably they /want/ a new object -- why should the > compiler try to second-guess him/her ? Sorry, I seem to be unable to express myself properly. I mean why isn't the second treated like the first. (Btw, I do know Strings in java are objects (which contains an array of char))
Let me get something straight first, correct me if I am wrong here.
String v = "Java" : leads to a String object with the value "java"
String v = new String("Java") : also leads to a string object with the value "java"
The difference is: The first the text is a literal which can/will be interned automatically, and which creates a String object with the specified value. While the second creates an object with the literal value "java" which then creates the object v with the first object as its argument.
correct? The essence is both produce an object with the specified value, but the second example requires more work, correct?
Here is the real question: Since string assignment statements can be done as in the first example (as opposed to other object types), why is not the second example basically treated like the first, because that would remove the need for creating more work than necessary in the second example.
What we want is an string object with the given value, the second seems to do more work than necessary, for the string case, so why not optimise it away? Jussi mentioned interning and memory but that could perhaps be solved by a more aggressive gc for interned strings.
I hope I was able to explain it properly this time:/
/tom
Chris Uppal - 30 Mar 2006 17:19 GMT > Let me get something straight first, correct me if I am wrong here. > [quoted text clipped - 9 lines] > then creates the object v with the first object as its argument. > correct? No. The first simply assigns another reference to an object /that already existed/. It doesn't create /anything/. That was the point of my digression into global variables (it may make more sense if you read it again now). The String object is not interned at that point either -- that also happened back when the String was first created.
The second creates an object. The first does not. Not even "conceptually". The instance of String corresponding to the string literal was created when the class was loaded, unless another class already used that string, in which case it was created when /that/ class was loaded.
(In fact, I suppose an implementation might cheat, and only create a String object from a constant pool entry lazily -- but I can't see any advantage in that level of messing around. And in any case it doesn't matter -- it is required to act /exactly/ as if the String was there all along).
> Here is the real question: Since string assignment statements can be > done as in the first example (as opposed to other object types), why is > not the second example basically treated like the first, because that > would remove the need for creating more work than necessary in the > second example. Does it make sense now ? In the second statement, the programmer is asking for something totally unrelated to the first.
-- chris
Tom Fredriksen - 30 Mar 2006 18:10 GMT >> Let me get something straight first, correct me if I am wrong here. >> [quoted text clipped - 15 lines] > String object is not interned at that point either -- that also happened back > when the String was first created. (The answer to my question still somewhat eludes me, so lets give it another go. But I think this is what I have been saying "all along")
Yes, I understand this. When the jvm sees the literal "java" it makes a String object of it and when v is assigned it gets the reference to the previously created string object, which will be shared by all other variables using the same litaral.
> The second creates an object. The first does not. Not even "conceptually". > The instance of String corresponding to the string literal was created when the > class was loaded, unless another class already used that string, in which case > it was created when /that/ class was loaded. In the second example, "java" is a literal (possibly the same literal used again as in the first), where a new is used which creates a string object with the constructor argument of the string object containing the literal "java"
So the first example ends up with a string with the value "java", and so does the second example. The only difference the second example performs an object creation
> Does it make sense now ? In the second statement, the programmer is asking for > something totally unrelated to the first. "Something totally unrelated to the first", what are you thinking of here?
To answer your question, I think: programatically yes, but in essence its the same thing: both are requesting a string object with the given value.
So then the question: why not optimise the difference away? since the difference is only in how the result is created, does it matter that there is a difference?
hmm, do you mean that because with the "new" you can control whether you have a unique object different from a potential string object create by way of a literal? and that is why one wants it to be different?
/tom
Eric Sosman - 30 Mar 2006 18:24 GMT Tom Fredriksen wrote On 03/30/06 12:10,:
> [...] > [quoted text clipped - 5 lines] > difference is only in how the result is created, does it matter that > there is a difference? Sure. The "optimization" would change the output of:
String a = "Java"; String b = new String(a); System.out.println(a == b);
An "optimization" that changes the program's output (in other than time-related ways) is more commonly called a "compiler bug."
Here's another (rather contrived) program whose behavior would change:
class Bogus implements Runnable { public static void main(String[] unused) { String a = "Java"; String b = new String(a); new Thread(new Bogus(a)).start(); new Thread(new Bogus(b)).start(); }
private final String key; Bogus(String key) { this.key = key; }
public void run() { for (;;) { synchronized(key) { do_something(); } } } }
As Java actually behaves, the two threads synchronize on different String objects and run independently. With your "optimization" they would synchronize on the same String object and thus contend with each other.
 Signature Eric.Sosman@sun.com
Jussi Piitulainen - 30 Mar 2006 18:16 GMT Tom Fredriksen writes, referring to new String("java"):
> What we want is an string object with the given value, the second > seems to do more work than necessary, for the string case, so why > not optimise it away? Why bother? One should write just the simpler "java": The writer of the code has less to do. The reader of the code has less to do. The compiler writer has less to do. The program itself has less to do. Just about everybody is happier.
Let the compiler writer spend their time on things that are generally useful in the programs that people actually write.
And stop writing new String("java"), please. Where do people pick this up anyway? Do they just like the tedium? Could they be convinced to write "java".intern().intern().intern() instead? (It's even longer.)
Eric Sosman - 30 Mar 2006 16:44 GMT Tom Fredriksen wrote On 03/30/06 05:02,:
> > > [quoted text clipped - 21 lines] > ways of doing the same thing, especially when they operate slightly > different for, to me, no apparent reason? The fundamental promise of `new' is that it will create a brand-new object, distinct from all existing objects. The `new' operator can never "recycle" an old object, not even if the old object's state ("value") is the same as the one being created.
It's easy to see why this is crucial for a mutable class. Two distinct instances of a mutable class could start life with identical contents, but the program can change each one independently of the other. If the two shared the same underlying object instance, this would not work.
For an immutable class like String this guarantee of instance uniqueness is less useful. Once the object is created its contents will remain forever unchanged, so there's not much point in having multiple copies of the same unchangeable object lying around. However, the notion of "immutable" is not quite as cut-and-dried as the simple word makes it sound (there was a recent thread on this very topic), and the Java language doesn't have a means to express all the shadings and gradations of "immutability." In the interests of simplicity, perhaps, Java has just one `new' operator rather than a host of slightly different `newish' operators -- and since `new' must allow mutable objects to work properly, `new' must always, always, always generate a brand-new object.
It's been pointed out that the difference is detectable: in Chris' example, a==c is false because the two variables refer to distinct String objects. The two happen to have identical content (so a.equals(c) is true), but are not the same object. As I wrote earlier, if there are five pennies in my pocket and five in yours, our pockets have identical content -- but I will protest if you try to take the pennies from my pocket, because my pocket is not yours.
 Signature Eric.Sosman@sun.com
Tom Fredriksen - 30 Mar 2006 20:14 GMT > The fundamental promise of `new' is that it will > create a brand-new object, distinct from all existing > objects. The `new' operator can never "recycle" an > old object, not even if the old object's state ("value") > is the same as the one being created. Ok, I get it now (sorry chris, for the last post). The simpler one only returns a reference to the object, which is shared. While the one with new creates an entirely new string object with no sharing.
What confused me was the mentioning of interning a string. That lead to my perception that the simpler version was different objects but where the char array was what was shared.
> For an immutable class like String this guarantee > of instance uniqueness is less useful. Once the object > is created its contents will remain forever unchanged, > so there's not much point in having multiple copies of > the same unchangeable object lying around. This is what I was thinking.
> However, the > notion of "immutable" is not quite as cut-and-dried as [quoted text clipped - 6 lines] > must allow mutable objects to work properly, `new' must > always, always, always generate a brand-new object. Maybe an object could have a qualifier stating that the object is immutable, e.g. const. That way you don't need to use different new keywords for different shades of immutability.
Anyway, thanks for clearing this up for me.
/tom
James McGill - 30 Mar 2006 02:43 GMT > Consider what > String var - new String(new String(new String("java"))) > would do. It will make me check the bytecode, and see that sure enough, the compiler isn't smart enough to optimize this out :-)
Chris Uppal - 30 Mar 2006 10:15 GMT > > Consider what > > String var - new String(new String(new String("java"))) > > would do. > > It will make me check the bytecode, and see that sure enough, the > compiler isn't smart enough to optimize this out :-) Nor should it be. It could not possibly be justified in removing code to create objects.
-- chris
James McGill - 30 Mar 2006 02:39 GMT > in two example above, i want to know that every two definition are > same > as each other about memory allocation or not ? Sun javac appears to intern the string "java" in the class file, so the constuctor for the new String("java") version, also uses the same text "java" from the first declared version. Also, a constructor is called for the second version, but not the first.
So there's something going on with String var = "java"; that's distinct from String var = new String("java");
It bothers me a little that the javadoc for String says:
String str = "abc";
is equivalent to:
char data[] = {'a', 'b', 'c'}; String str = new String(data); In a practical sense they are equivalent, but the resulting bytecode is different. Does it matter?
Roedy Green - 30 Mar 2006 06:14 GMT On Wed, 29 Mar 2006 18:39:31 -0700, James McGill <jmcgill@cs.arizona.edu> wrote, quoted or indirectly quoted someone who said :
>In a practical sense they are equivalent, but the resulting bytecode is >different. Does it matter? It only matters if for some strange reason you want two distinct String objects. I have never run into a practical case where interning would hurt anything. The biggest problem is using == and having it work. You get lulled into trusting it. IntelliJ inspector warns you of any == use on Strings. It works MOST of the time. But you can only trust it if you are sure all the strings you are comparing are interned.
See http://mindprod.com/jgloss/interned.html
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|