Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / August 2006

Tip: Looking for answers? Try searching our database.

String storage

Thread view: 
Dražen Gemić - 05 Aug 2006 13:07 GMT
Does anyone know how the strings are stored in Java.

Let's say that there are two strings:

String s1="abc-def";
String s2="abc-def";

Do s1 and s2 reference to the one or two objects ?
Since the srings are immutable, it would make sense
to have one object referenced by two variables.

Is rhe character sequence "abc-def" stored (and memory allocated) only
once or twice ?

I need to be sure, because I have an application with lots of repeating
strings (people names).

DG
Eric Sosman - 05 Aug 2006 13:42 GMT
> Does anyone know how the strings are stored in Java.
>
[quoted text clipped - 4 lines]
>
> Do s1 and s2 reference to the one or two objects ?

    Identical *string literals* generate references to the
same String object, so in your example there is just one
String and two references to it.

> I need to be sure, because I have an application with lots of repeating
> strings (people names).

    Strings that are generated at run-time are another matter.
If you build a String by reading it from an input file or
manufacturing it programatically, you can get multiple distinct
Strings with identical contents:

    String s3 = "-def";
    s3 = "abc" + s3;

At this point s3 refers to a String whose contents are "abc-def",
but it is not the same String s1 and s2 refer to.  Try it:

    System.out.println("s3.equals(s1) = " + s3.equals(s1));
    System.out.println("s3 == s1 = " + (s3 == s1));

    Quite likely, all you need to do is be sure to use equals()
and not == when you want to test whether two Strings have equal
value.  People's names are -- what?  twenty characters?  thirty?
forty?  A million forty-character names will occupy about eighty
megabytes, which is not an amount to worry about unduly.  Just
write the code in an ordinary way and don't worry about duplicates.
Then measure it to see whether you have a problem: if you do, you
can *then* think about ways to detect sets of duplicates and turn
them into references to a single "canonical" String.  But you
should do this sort of thing last, not first.

Signature

Eric Sosman
esosman@acm-dot-org.invalid

Patricia Shanahan - 05 Aug 2006 16:03 GMT
> Does anyone know how the strings are stored in Java.
>
[quoted text clipped - 4 lines]
>
> Do s1 and s2 reference to the one or two objects ?

String constant expressions (not just String literals) with the same
value reference the same object.

Your s1 and s2 reference the same object, as would "abc-"+"def".

> Since the srings are immutable, it would make sense
> to have one object referenced by two variables.
>
> Is rhe character sequence "abc-def" stored (and memory allocated) only
> once or twice ?

For String constant expressions, definitely only once.

For dynamically built strings, it depends on how they are built. Inside
the String implementation, there is a reference to a char[] with offset
and length, so a substring can use the same char[] as the string from
which it was derived.

> I need to be sure, because I have an application with lots of repeating
> strings (people names).

You can force same object behavior by using the String method intern().

Suppose you had a String reference s3 that was built some other way, not
has a substring, and has value "abc-def". It would reference a different
String object, and might reference a different char[]. However,
s3.intern() would reference the same object as s1 and s2.

I would do measurements to see whether the net performance using intern
is better or worse than without. It does cost some extra bookkeeping.

Patricia
Dražen Gemić - 06 Aug 2006 08:20 GMT
This is the answer for Eric, too.

> You can force same object behavior by using the String method intern().

This part is most interesting to mee. In fact it can save me some speed,
because I have created some kind of String pool myself, based on HashMap.

You might be interested in more details of the problem.

My customer is a company that is in education business. They have
a number of lecturers and a number of classrooms.

They need a web application to cover their activities. I have written
a HTML interface that represents a month timetable of classes. It is a
web page that represents current month with a few days of next and
previous months. The page shows all the classrooms in one page, as grids
with days in columns and hours of the day in rows. Each cell represend
30 minutes interval.

Additionally, they wanted lecturer's names as a tooltip (when one drives
mouse within the cell).

So there are many cells, and only a few lecturers.

Each timetable page request is backed by separate object on the server
side, because the customers can choose between the months. The timetable
page is going to be requeste very often, so I need to optimise the
memory usage.

If I understood correctly, Patricia says that there is one String pool
for the whole VM. Is that correct ? And the pool is emptied by GC ?

DG
Eric Sosman - 06 Aug 2006 16:11 GMT
> [...]
> My customer is a company that is in education business. They have
[quoted text clipped - 6 lines]
> with days in columns and hours of the day in rows. Each cell represend
> 30 minutes interval.

    One month plus a little -- let's say forty days.

    Cells at thirty-minute intervals -- forty-eight cells per day.

    Not entirely clear about how the classrooms enter into this,
but let's be pessimistic and suppose a separate 40x48 grid for
each of 100 classrooms.

    Grand total: 192000 cells (a rather unwieldy Web page, I'd say.)

    If each cell had its very own un-shared forty-character String,
they would use a little less than fifteen megabytes of character
data.

> Additionally, they wanted lecturer's names as a tooltip (when one drives
> mouse within the cell).
[quoted text clipped - 5 lines]
> page is going to be requeste very often, so I need to optimise the
> memory usage.

    I think you are attacking the problem from the wrong end: You
are worrying about the String objects, but you really ought to be
thinking about the Lecturer objects.  Each Lecturer presumably owns
a String with the lecturer's name, so there are only as many of those
Strings as there are Lecturers -- and among that group, there are
probably not very many duplicates.  (You may have more than one
"John Smith" on the faculty, but you don't have thousands and
thousands of them.)

    When you tell some JThing to use a particular Lecturer's name
String as its tool tip, you are not creating a brand-new String
(well you *could*, but that would be silly).  You are just telling
the JThing to use the existing String; if you set the same String
as the tool tip for a hundred different JThings, they all share
the same String object.

    I strongly suspect that you are "optimizing" for a problem you
do not even suffer from.  That's a waste of your time and energy,
and to the extent that it makes your code more complicated it's
also a threat to maintainability.  That's why I'll reiterate my
earlier advice: Do *not* do anything special to deal with the
problem until and unless you have reason to believe the problem
actually exists.  Nothing you have presented thus far suggests
that you have a problem.

Signature

Eric Sosman
esosman@acm-dot-org.invalid



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.