Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / January 2006

Tip: Looking for answers? Try searching our database.

String.substring, under the hood

Thread view: 
Roedy Green - 14 Jan 2006 07:36 GMT
Here is how String.substring works:

public String substring(int beginIndex, int endIndex) {
       if (beginIndex < 0) {
           throw new StringIndexOutOfBoundsException(beginIndex);
       }
       if (endIndex > count) {
           throw new StringIndexOutOfBoundsException(endIndex);
       }
       if (beginIndex > endIndex) {
           throw new StringIndexOutOfBoundsException(endIndex -
beginIndex);
       }
       return ((beginIndex == 0) && (endIndex == count)) ? this :
           new String(offset + beginIndex, endIndex - beginIndex,
value);
   }

Note that it now always creates a new string (unless the substring is
the string itself.)  It used to create a view into the underlying
string.  

So the efficiencies have changed.  Substring no longer pins the
underlying big string. On the other hand, you will create many string
objects  by using substring.  So be careful with it. It is no longer
free in terms of ram to have many substrings of your big string.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Stefan Schulz - 14 Jan 2006 09:19 GMT
[...]
> Note that it now always creates a new string (unless the substring is
> the string itself.)  It used to create a view into the underlying
[quoted text clipped - 4 lines]
> objects  by using substring.  So be careful with it. It is no longer
> free in terms of ram to have many substrings of your big string.

While i consider such things implementation details of
java.lang.String, and therefore not really my concern, the new
behaviour is more in line with the "reasonable expectations" of most
programmers. If i create a new object, i expect its storage to be
allocated somewhere. Also, if i drop a reference to a very long string,
but retain a tiny subsection, i expect to be able to drop all but the
small subsection.
Stefan Schulz - 14 Jan 2006 12:30 GMT
Also, upon looking at the code again, it still creates a "view" which
is backed by the same char array (same as before!)
Thomas Hawtin - 14 Jan 2006 18:19 GMT
> [...]
>
>>Note that it now always creates a new string (unless the substring is
>>the string itself.)  It used to create a view into the underlying
>>string.

As pointed out in other postings, it doesn't copy. String uses the
rather confusing technique of rearranging arguments in order to give
constructors different semantics. A (package) private constructor does
not do the additional copy.

>>So the efficiencies have changed.  Substring no longer pins the
>>underlying big string. On the other hand, you will create many string
>>objects  by using substring.  So be careful with it. It is no longer
>>free in terms of ram to have many substrings of your big string.

Pin refers to stopping an object from being moved by the garbage
collector. The new (sub)String (strongly) references the full character
array of the original String.

> While i consider such things implementation details of
> java.lang.String, and therefore not really my concern, the new
[quoted text clipped - 3 lines]
> but retain a tiny subsection, i expect to be able to drop all but the
> small subsection.

Performance is externally visible behaviour. It is quite normal for
client code to take it into account.

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Stefan Schulz - 14 Jan 2006 18:34 GMT
> >>So the efficiencies have changed.  Substring no longer pins the
> >>underlying big string. On the other hand, you will create many string
[quoted text clipped - 4 lines]
> collector. The new (sub)String (strongly) references the full character
> array of the original String.

This is exactly what i said. I just wonder what the OP meant when the
complained about a new String being created... with Strings being
immutable, you need to create a new copy each time you modify it (for
example, by taking a substring). The backing character array is not
copied, though (which can lead to unexpectedly high memory costs for
small strings).

> > While i consider such things implementation details of
> > java.lang.String, and therefore not really my concern, the new
[quoted text clipped - 6 lines]
> Performance is externally visible behaviour. It is quite normal for
> client code to take it into account.

That is correct, however the definition of the substring method does
not offer any guarantees about performance. It might be constant time,
but possibly wasting space (the current method), or it might take
linear to the length of the substring, or anything else. The method
definiton does not tell you one way or another, so you should not rely
on the behaviour. Maybe another JRE will do things completely the other
way around. The actual time needed depends on the implementation, and
without any specified behaviour is not an external characteristic.
Alan Krueger - 16 Jan 2006 15:49 GMT
> Performance is externally visible behaviour. It is quite normal for
> client code to take it into account.

It might be externally visible, but it may not be guaranteed by the
creator of the class.  Relying on internal implementation details
violates encapsulation and may break if the internal implementation is
changed.
Chris Uppal - 14 Jan 2006 12:49 GMT
> Note that it now always creates a new string (unless the substring is
> the string itself.)  It used to create a view into the underlying
> string.  

The substring created will share the underlying char[] array.

To the best of my memory that has always been the behavior.
Unfortunately, I don't have source from a JDK before 1.4.2 handy to
check.

   -- chris
Chris Smith - 14 Jan 2006 17:50 GMT
> Note that it now always creates a new string (unless the substring is
> the string itself.)  It used to create a view into the underlying
> string.  

This seems to come up every once in a while.

>         return ((beginIndex == 0) && (endIndex == count)) ? this :
>             new String(offset + beginIndex, endIndex - beginIndex,
>             value);

This is a call to a private constructor inside the String class, which
reuses the underlying char[].  It does not do the same thing as the
public String(String) constructor, which copies the underlying data.  So
when people say that "new String" copies the underlying char[], you
should only apply that statement to the String(String) overloaded
constructor, and not to the String(int,int,char[]) private overload used
there.

Signature

www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

Owen Jacobson - 14 Jan 2006 22:32 GMT
> Here is how String.substring works:

...snip Sun's implementation...
> return ... new String(offset + beginIndex, endIndex - beginIndex, value);

> Note that it now always creates a new string (unless the substring is
> the string itself.)  It used to create a view into the underlying
[quoted text clipped - 4 lines]
> objects  by using substring.  So be careful with it. It is no longer
> free in terms of ram to have many substrings of your big string.

Note which constructor this invokes: String (int, int, char[]).  Sun's
implementation of substring, at least as of 1.5.05 and as far back as I've
been using Java, shares the char[] containing the String's characters
on calls to substring.  It still pins the underlying char array, and is
still cheap both computationally and memory-wise if the originating
string's lifespan is at least as long as those of the substrings.
Owen Jacobson - 14 Jan 2006 22:34 GMT
...snip...

edit: f.ck, beaten.
Roedy Green - 14 Jan 2006 23:38 GMT
On Sat, 14 Jan 2006 07:36:33 GMT, Roedy Green
<my_email_is_posted_on_my_website@munged.invalid> wrote, quoted or
indirectly quoted someone who said :

>Here is how String.substring works:

here is my latest understanding:

substring is clever. It does not make a deep copy of the substring the
way most languages do. It just creates a pointer into the original
immutable String, i.e. points to the value char[] of the base string,
and tracks the starting offset where the substring starts and count of
how long the substring is. This could be confusing if you were
low-level debugging since you would see the whole String, not just the
substring. There were reports of a bug in Microsoft's implementation
of substring. The downside of this cleverness is a tiny substring of a
giant base String could suppress garbage collection of that big String
in memory even if the whole String were no longer needed. (actually
its value char[] array is held in RAM; the String object itself could
be collected.)

It is probably still a good idea to use indexOf( lookFor, offset )
with a rather than creating a substring first and using indexOf(
lookFor ) on that.

If you know a tiny substring is holding a giant string in RAM, that
would otherwise be garbage collected, you can break the bond by using
littleString = new String( littleString ) which will create a new
smaller backing char[] with no ties to the original String.

If you are a curious sort, and study the code for String. substring in
src.zip, this sharing logic might not be apparent. The key is a
non-public String constructor that takes parameters in the reverse of
the usual order String (int offset, int count, char value[]).
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.