Java Forum / First Aid / June 2008
String intern question
Tommy Halsbrekk - 19 Jun 2008 15:30 GMT Hi
I was reading a bit about Strings intern() and have a question. Comparing the following two examples, what is the difference in practical terms?
str2 = str1.intern(); and
str2 = str1;
I know the internal difference is that the first example ends up with two different objects with the same internal literal representation of the string value and the second ends up with two pointers to the same object. But in some situations, for example when memory is precious, couldn't one use example 2 instead?
regards
tommy
Lew - 19 Jun 2008 19:09 GMT > I was reading a bit about Strings intern() and have a question. > Comparing the following two examples, what is the difference in [quoted text clipped - 4 lines] > > str2 = str1; It depends on what was assigned to both variables. If both Strings are interned the == operator will return 'true' iff it is true that str1.equals( str2 ). If one or the other String is not interned, and both references do not point to the same object, then == will not return true even if str1.equals( str2 ). For your question, that means that the two assignments will have the same effect if str1 == str1.intern(), but not otherwise.
> I know the internal difference is that the first example ends up with > two different objects with the same internal literal representation of Not necessarily. There could be only one object involved, if str1 == str1.intern().
> the string value and the second ends up with two pointers to the same > object. No, both assignments could wind up with pointers to the same object.
> But in some situations, for example when memory is precious, couldn't one use example 2 instead? Memory is always precious. One could use example 2 just about always, and usually would just about always. It has little or nothing to do with memory being precious; one only interns when there is a good reason to. The intern() call is extra confusion when it doesn't help. In fact, the more precious memory is, the more likely one is to use example 1, since interning reduces the number of String instances floating around.
-- Lew
Tommy Halsbrekk - 19 Jun 2008 20:02 GMT > help. In fact, the more precious memory is, the more likely one is to > use example 1, since interning reduces the number of String instances > floating around. I am not sure I understand what you are saying here.
The way I understand String, is that a java string is composed of two elements, the String object and its char sequence. When intern() is called, the char sequence is shared between String objects, because intern() returns a String object. So I thought intern() creates a new String object which reuses the interned char sequence. But you are saying it could return a pointer to itself. As a consequence of the intern() call. Instead of creating a new String object, right? But how does that save more memory than having two different pointers pointing to the same String object, as in my seconds example? Both, what you are saying, and what I am saying (through the second example) would then in effect be the same, i.e. two pointers to the same object?
regards
Tommy
Joshua Cranmer - 19 Jun 2008 21:11 GMT > The way I understand String, is that a java string is composed of two > elements, the String object and its char sequence. When intern() is > called, the char sequence is shared between String objects, because > intern() returns a String object. I believe the char sequence is always shared. What intern() guarantees that any two strings with the same character sequence will always be equivalent pointers (i.e., via ==); I also believe that this is the same string that would be returned from a literal.
So |String composed = "a"+0, nonComposed = "a0";| has one character buffer, but two String objects with overhead. If I then did
composed = composed.intern(); nonComposed = nonComposed.intern();
(assuming garbage collection wipes away the original objects), I have one character buffer and one String overhead object.
 Signature Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
Tommy Halsbrekk - 19 Jun 2008 21:51 GMT > So |String composed = "a"+0, nonComposed = "a0";| has one character > buffer, but two String objects with overhead. If I then did Well, the compiler performs automatic intern on string literals, so it might actually make the references point to the same String object.
> composed = composed.intern(); > nonComposed = nonComposed.intern(); > > (assuming garbage collection wipes away the original objects), I have > one character buffer and one String overhead object. This would then be superfluous, if I understand it correctly.
regards
tommy
Mark Space - 19 Jun 2008 22:02 GMT >> help. In fact, the more precious memory is, the more likely one is to >> use example 1, since interning reduces the number of String instances [quoted text clipped - 7 lines] > intern() returns a String object. So I thought intern() creates a new > String object which reuses the interned char sequence. But you are Well, I don't think intern() always returns a "new" object. Consider this:
String s1 = "test1";
OK, the Java spec says, I believe, that constant strings in your program are automatically intern'd by the compiler. So "test1" is already in the internal string pool when your program starts (or a at least before the line above is execute). So no you do a:
String s2 = s1.intern();
Intern() sees that s1 is already a internalized string, and just returns the original references, s1. There's no work for it to do, so it does nothing to the string s1 refers to.
Now consider adding this:
StringBuilder sb = new StringBuilder(); sb.append("test"); sb.append("1");
String s3 = sb.toString().intern();
Intern() looks at the string sb built, sees it's "test1", see that it has "test1" in it's internal pool already, and so returns the string from it's internal pool. s3 will be the same as s1 (s3 == s1 is true). The string that sb contains can be released and garbage collected as soon as sb goes out of scope. That's how intern() saves memory. It re-uses existing strings.
> saying it could return a pointer to itself. As a consequence of the See above.
> intern() call. Instead of creating a new String object, right? But how > does that save more memory than having two different pointers pointing See above.
> to the same String object, as in my seconds example? Both, what you are > saying, and what I am saying (through the second example) would then in > effect be the same, i.e. two pointers to the same object? I think I got these all answered in the example I gave. Let us know if I did or if I missed it. ;-)
> regards > > Tommy Roedy Green - 20 Jun 2008 01:13 GMT >I was reading a bit about Strings intern() see http://mindprod.com/jgloss/interned.html
 Signature
Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Roedy Green - 20 Jun 2008 01:15 GMT >I know the internal difference is that the first example ends up with >two different objects with the same internal literal representation of >the string value and the second ends up with two pointers to the same >object. But in some situations, for example when memory is precious, >couldn't one use example 2 instead? interning SAVES memory by replacing a reference to a duplicate with a reference to the single master copy. This allows dups to be garbage collected.
Interning takes time, but it saves RAM.
 Signature
Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Daniel Pitts - 20 Jun 2008 01:40 GMT >> I know the internal difference is that the first example ends up with >> two different objects with the same internal literal representation of [quoted text clipped - 7 lines] > > Interning takes time, but it saves RAM. The problem with intern, is that interned strings live for the duration of the system classloader (often the lifespan of the JVM instance), which can lead to terrible memory leaks if used improperly.
It saves ram in the case where the same string may be duplicated. An example where it wastes space is trivial to create. public static void main(String...args) { for (int i = 0; i < Integer.MAX_VALUE; ++i) { System.out.println(("foo" + i).intern()); } }
Now, Integer.MAX_VALUE strings that will only appear once will be stored indefinitely, leading to a memory leak, and probably an OOM on most systems, where without the intern the program would run fine.
So, intern can save memory if you have many copies of the same string in many places, but wastes memory if you don't. The real moral is that don't use intern unless you have a damned good reason to. If you don't know whether you're reason is good, then it probably isn't :-).
If you're getting OOM, and a memory profiler shows you that you have millions of copies of "Foo1", *then* you should consider using intern.
Even then, consider other alternatives, such as a shorter lived String pool so that you don't needlessly fill the intern pool.
 Signature Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
Mark Space - 20 Jun 2008 02:41 GMT > Even then, consider other alternatives, such as a shorter lived String > pool so that you don't needlessly fill the intern pool. That's actually a really good point, one I hadn't considered. Considering how easy and efficient things like HashMap are to implement and use, there's no real excuse for not doing so.
Roedy Green - 20 Jun 2008 13:21 GMT On Thu, 19 Jun 2008 17:40:57 -0700, Daniel Pitts <newsgroup.spamfilter@virtualinfinity.net> wrote, quoted or indirectly quoted someone who said :
>The problem with intern, is that interned strings live for the duration >of the system classloader (often the lifespan of the JVM instance), >which can lead to terrible memory leaks if used improperly. On the other paw, interned string will compare more quickly, even if you use equals instead of ==.
There are a raft of considerations. Don't just do it mindlessly.
 Signature
Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Piotr Kobzda - 23 Jun 2008 22:33 GMT > The problem with intern, is that interned strings live for the duration > of the system classloader (often the lifespan of the JVM instance), > which can lead to terrible memory leaks if used improperly. Not quite true. Interned strings cache is now usually implemented in soft references fashion, thus interned strings may become eligible for garbage collection as soon as they are no longer strongly referenced.
> It saves ram in the case where the same string may be duplicated. > An example where it wastes space is trivial to create. That's trivial, however, your example do not causes that.
> public static void main(String...args) { > for (int i = 0; i < Integer.MAX_VALUE; ++i) { [quoted text clipped - 5 lines] > indefinitely, leading to a memory leak, and probably an OOM on most > systems, where without the intern the program would run fine. On most modern systems your example will run without any problems. Run the following code to ensure that:
import java.lang.ref.PhantomReference; import java.lang.ref.Reference; import java.lang.ref.ReferenceQueue; import java.util.HashSet; import java.util.Set;
public class InternedStringsCleanupTest {
public static void main(String[] args) { ReferenceQueue<String> refq = new ReferenceQueue<String>(); Set<Reference<?>> refs = new HashSet<Reference<?>>(); int gced = 0; for (int i = 0; i < Integer.MAX_VALUE; ++i) { String foo = ("foo" + i).intern(); refs.add(new PhantomReference<String>(foo, refq)); int gcedInLastPass = 0; for (Reference<?> ref; (ref = refq.poll()) != null;) { refs.remove(ref); ++gcedInLastPass; } if (gcedInLastPass > 0) { gced += gcedInLastPass; System.out.println("after creation of " + foo + " " + gced + " (" + gcedInLastPass + " in last pass)" + " interned strings became unreachable"); } } } }
piotr
Tommy Halsbrekk - 20 Jun 2008 07:38 GMT > interning SAVES memory by replacing a reference to a duplicate with a > reference to the single master copy. This allows dups to be garbage > collected. > > Interning takes time, but it saves RAM. (I have read your page on intern, but I could not find an answer to my question.)
I understand that, but wouldn't a copy of the reference, as in str2 = str1, save just as much memory as intern?
For example, say I have thousands of strings with the text "http://www.sourceforge.org/", wouldn't having one copy of that String in str1 and then for every other, just do strN = str1 save just as much space as an intern? and do the job faster.
regards
Tommy
Roedy Green - 20 Jun 2008 13:27 GMT >I understand that, but wouldn't a copy of the reference, as in >str2 = str1, save just as much memory as intern? That is not what intern is for.
Let's say you read strings from a CSV file containing a list of names and addresses. You would have two totally separate strings both saying "Elm Street".
They would both lie around in memory.
However if you used intern
s1= s1.intern(); s2 = s2.intern();
then s1 and s1 would point to the same String object. The other would be discarded.
Of course you could do something like interning yourself.
If you wrote:
if ( s1.equals(s2)) { s1 = s2;}
Then both would point to the same String object and the other would be discarded. But how would you know to do that? That's what intern sorts out for you on a global basis considering all possible interned Strings.
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Roedy Green - 20 Jun 2008 15:36 GMT >I was reading a bit about Strings intern() and have a question I have written another section to my intern essay. Perhaps it will clarify.
See http://mindprod.com/jgloss/interned.html#MANUAL
 Signature
Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|