Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / First Aid / June 2008

Tip: Looking for answers? Try searching our database.

String intern question

Thread view: 
Tommy Halsbrekk - 19 Jun 2008 15:30 GMT
Hi

I was reading a bit about Strings intern() and have a question.
Comparing the following two examples, what is the difference in
practical terms?

    str2 = str1.intern();
and

    str2 = str1;

I know the internal difference is that the first example ends up with
two different objects with the same internal literal representation of
the string value and the second ends up with two pointers to the same
object. But in some situations, for example when memory is precious,
couldn't one use example 2 instead?

regards

tommy
Lew - 19 Jun 2008 19:09 GMT
> I was reading a bit about Strings intern() and have a question.
> Comparing the following two examples, what is the difference in
[quoted text clipped - 4 lines]
>
>         str2 = str1;

It depends on what was assigned to both variables.  If both Strings
are interned the == operator will return 'true' iff it is true that
str1.equals( str2 ).  If one or the other String is not interned, and
both references do not point to the same object, then == will not
return true even if str1.equals( str2 ).  For your question, that
means that the two assignments will have the same effect if str1 ==
str1.intern(), but not otherwise.

> I know the internal difference is that the first example ends up with
> two different objects with the same internal literal representation of

Not necessarily.  There could be only one object involved, if str1 ==
str1.intern().

> the string value and the second ends up with two pointers to the same
> object.

No, both assignments could wind up with pointers to the same object.

> But in some situations, for example when memory is precious, couldn't one use example 2 instead?

Memory is always precious.  One could use example 2 just about always,
and usually would just about always.  It has little or nothing to do
with memory being precious; one only interns when there is a good
reason to.  The intern() call is extra confusion when it doesn't
help.  In fact, the more precious memory is, the more likely one is to
use example 1, since interning reduces the number of String instances
floating around.

--
Lew
Tommy Halsbrekk - 19 Jun 2008 20:02 GMT
> help.  In fact, the more precious memory is, the more likely one is to
> use example 1, since interning reduces the number of String instances
> floating around.

I am not sure I understand what you are saying here.

The way I understand String, is that a java string is composed of two
elements, the String object and its char sequence. When intern() is
called, the char sequence is shared between String objects, because
intern() returns a String object. So I thought intern() creates a new
String object which reuses the interned char sequence. But you are
saying it could return a pointer to itself. As a consequence of the
intern() call. Instead of creating a new String object, right? But how
does that save more memory than having two different pointers pointing
to the same String object, as in my seconds example? Both, what you are
saying, and what I am saying (through the second example) would then in
effect be the same, i.e. two pointers to the same object?

regards

Tommy
Joshua Cranmer - 19 Jun 2008 21:11 GMT
> The way I understand String, is that a java string is composed of two
> elements, the String object and its char sequence. When intern() is
> called, the char sequence is shared between String objects, because
> intern() returns a String object.
I believe the char sequence is always shared. What intern() guarantees
that any two strings with the same character sequence will always be
equivalent pointers (i.e., via ==); I also believe that this is the same
string that would be returned from a literal.

So |String composed = "a"+0, nonComposed = "a0";| has one character
buffer, but two String objects with overhead. If I then did

composed = composed.intern();
nonComposed = nonComposed.intern();

(assuming garbage collection wipes away the original objects), I have
one character buffer and one String overhead object.

Signature

Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Tommy Halsbrekk - 19 Jun 2008 21:51 GMT
> So |String composed = "a"+0, nonComposed = "a0";| has one character
> buffer, but two String objects with overhead. If I then did

Well, the compiler performs automatic intern on string literals, so it
might actually make the references point to the same String object.

> composed = composed.intern();
> nonComposed = nonComposed.intern();
>
> (assuming garbage collection wipes away the original objects), I have
> one character buffer and one String overhead object.

This would then be superfluous, if I understand it correctly.

regards

tommy
Mark Space - 19 Jun 2008 22:02 GMT
>> help.  In fact, the more precious memory is, the more likely one is to
>> use example 1, since interning reduces the number of String instances
[quoted text clipped - 7 lines]
> intern() returns a String object. So I thought intern() creates a new
> String object which reuses the interned char sequence. But you are

Well, I don't think intern() always returns a "new" object.  Consider this:

  String s1 = "test1";

OK, the Java spec says, I believe, that constant strings in your program
are automatically intern'd by the compiler.  So "test1" is already in
the internal string pool when your program starts (or a at least before
the line above is execute).  So no you do a:

  String s2 = s1.intern();

Intern() sees that s1 is already a internalized string, and just returns
the original references, s1.  There's no work for it to do, so it does
nothing to the string s1 refers to.

Now consider adding this:

  StringBuilder sb = new StringBuilder();
  sb.append("test");
  sb.append("1");

  String s3 = sb.toString().intern();

Intern() looks at the string sb built, sees it's "test1", see that it
has "test1" in it's internal pool already, and so returns the string
from it's internal pool.  s3 will be the same as s1 (s3 == s1 is true).
 The string that sb contains can be released and garbage collected as
soon as sb goes out of scope.  That's how intern() saves memory.  It
re-uses existing strings.

> saying it could return a pointer to itself. As a consequence of the

See above.

> intern() call. Instead of creating a new String object, right? But how
> does that save more memory than having two different pointers pointing

See above.

> to the same String object, as in my seconds example? Both, what you are
> saying, and what I am saying (through the second example) would then in
> effect be the same, i.e. two pointers to the same object?

I think I got these all answered in the example I gave.  Let us know if
I did or if I missed it. ;-)

> regards
>
> Tommy
Roedy Green - 20 Jun 2008 01:13 GMT
>I was reading a bit about Strings intern()

see http://mindprod.com/jgloss/interned.html
Signature


Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 20 Jun 2008 01:15 GMT
>I know the internal difference is that the first example ends up with
>two different objects with the same internal literal representation of
>the string value and the second ends up with two pointers to the same
>object. But in some situations, for example when memory is precious,
>couldn't one use example 2 instead?

interning SAVES memory by replacing a reference to a duplicate with a
reference to the single master copy. This allows dups to be garbage
collected.

Interning takes time, but it saves RAM.  
Signature


Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Daniel Pitts - 20 Jun 2008 01:40 GMT
>> I know the internal difference is that the first example ends up with
>> two different objects with the same internal literal representation of
[quoted text clipped - 7 lines]
>
> Interning takes time, but it saves RAM.  
The problem with intern, is that interned strings live for the duration
of the system classloader (often the lifespan of the JVM instance),
which can lead to terrible memory leaks if used improperly.

It saves ram in the case where the same string may be duplicated.
An example where it wastes space is trivial to create.
public static void main(String...args) {
  for (int i = 0; i < Integer.MAX_VALUE; ++i) {
    System.out.println(("foo" + i).intern());
  }
}

Now, Integer.MAX_VALUE strings that will only appear once will be stored
indefinitely, leading to a memory leak, and probably an OOM on most
systems, where without the intern the program would run fine.

So, intern can save memory if you have many copies of the same string in
many places, but wastes memory if you don't.  The real moral is that
don't use intern unless you have a damned good reason to.  If you don't
know whether you're reason is good, then it probably isn't :-).

If you're getting OOM, and a memory profiler shows you that you have
millions of copies of "Foo1", *then* you should consider using intern.

Even then, consider other alternatives, such as a shorter lived String
pool so that you don't needlessly fill the intern pool.
Signature

Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Mark Space - 20 Jun 2008 02:41 GMT
> Even then, consider other alternatives, such as a shorter lived String
> pool so that you don't needlessly fill the intern pool.

That's actually a really good point, one I hadn't considered.
Considering how easy and efficient things like HashMap are to implement
and use, there's no real excuse for not doing so.
Roedy Green - 20 Jun 2008 13:21 GMT
On Thu, 19 Jun 2008 17:40:57 -0700, Daniel Pitts
<newsgroup.spamfilter@virtualinfinity.net> wrote, quoted or indirectly
quoted someone who said :

>The problem with intern, is that interned strings live for the duration
>of the system classloader (often the lifespan of the JVM instance),
>which can lead to terrible memory leaks if used improperly.

On the other paw, interned string will compare more quickly, even if
you use equals instead of ==.

There are a raft of considerations. Don't just do it mindlessly.
Signature


Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Piotr Kobzda - 23 Jun 2008 22:33 GMT
> The problem with intern, is that interned strings live for the duration
> of the system classloader (often the lifespan of the JVM instance),
> which can lead to terrible memory leaks if used improperly.

Not quite true.  Interned strings cache is now usually implemented in
soft references fashion, thus interned strings may become eligible for
garbage collection as soon as they are no longer strongly referenced.

> It saves ram in the case where the same string may be duplicated.
> An example where it wastes space is trivial to create.

That's trivial, however, your example do not causes that.

> public static void main(String...args) {
>   for (int i = 0; i < Integer.MAX_VALUE; ++i) {
[quoted text clipped - 5 lines]
> indefinitely, leading to a memory leak, and probably an OOM on most
> systems, where without the intern the program would run fine.

On most modern systems your example will run without any problems.  Run
the following code to ensure that:

import java.lang.ref.PhantomReference;
import java.lang.ref.Reference;
import java.lang.ref.ReferenceQueue;
import java.util.HashSet;
import java.util.Set;

public class InternedStringsCleanupTest {

    public static void main(String[] args) {
        ReferenceQueue<String> refq = new ReferenceQueue<String>();
        Set<Reference<?>> refs = new HashSet<Reference<?>>();
        int gced = 0;
        for (int i = 0; i < Integer.MAX_VALUE; ++i) {
            String foo = ("foo" + i).intern();
            refs.add(new PhantomReference<String>(foo, refq));
            int gcedInLastPass = 0;
            for (Reference<?> ref; (ref = refq.poll()) != null;) {
                refs.remove(ref);
                ++gcedInLastPass;
            }
            if (gcedInLastPass > 0) {
                gced += gcedInLastPass;
                System.out.println("after creation of " + foo + " "
                    + gced + " (" + gcedInLastPass + " in last pass)"
                    + " interned strings became unreachable");
            }
        }
    }
}

piotr
Tommy Halsbrekk - 20 Jun 2008 07:38 GMT
> interning SAVES memory by replacing a reference to a duplicate with a
> reference to the single master copy. This allows dups to be garbage
> collected.
>
> Interning takes time, but it saves RAM.  

(I have read your page on intern, but I could not find an answer to my
question.)

I understand that, but wouldn't a copy of the reference, as in
str2 = str1, save just as much memory as intern?

For example, say I have thousands of strings with the text
"http://www.sourceforge.org/", wouldn't having one copy of that String
in str1 and then for every other, just do strN = str1 save just as much
space as an intern? and do the job faster.

regards

Tommy
Roedy Green - 20 Jun 2008 13:27 GMT
>I understand that, but wouldn't a copy of the reference, as in
>str2 = str1, save just as much memory as intern?

That is not what intern is for.

Let's say you read strings from a CSV file containing a list of names
and addresses.  You would have two totally separate strings both
saying "Elm Street".

They would both lie around in memory.

However if you used intern

s1= s1.intern();
s2 = s2.intern();

then s1 and s1 would point to the same String object. The other would
be discarded.

Of course you could do something like interning yourself.

If you wrote:

if ( s1.equals(s2))
{ s1 = s2;}

Then both would point to the same String object and the other would be
discarded. But how would you know to do that?  That's what intern
sorts out for you on a global basis considering all possible interned
Strings.

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 20 Jun 2008 15:36 GMT
>I was reading a bit about Strings intern() and have a question

I have written another section to my intern essay. Perhaps it will
clarify.

See http://mindprod.com/jgloss/interned.html#MANUAL
Signature


Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.