Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / January 2006

Tip: Looking for answers? Try searching our database.

Optimising Garbage Collection

Thread view: 
Roedy Green - 14 Jan 2006 05:43 GMT
Consider a program where you read a file into RAM in one great String,
process it, create a new version, then if it is different, write it
back on one fell swoop. Repeat for thousands of files.

From GC's point of view, you are creating whacking two huge byte
arrays for the I/O, a huge StringBuilder and  two huge String objects
then discarding them for every file.

Ideally there should be some sort of mini GC where just these objects
are reclaimed, leaving the rest alone.

A generational collector obviously helps. Is there anything simple I
can do to boost this along.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Tony Morris - 13 Jan 2006 20:11 GMT
> Consider a program where you read a file into RAM in one great String,
> process it, create a new version, then if it is different, write it
[quoted text clipped - 12 lines]
> Canadian Mind Products, Roedy Green.
> http://mindprod.com Java custom programming, consulting and coaching.

Hello Roedy,
If you are doing small modifications to the class java.lang.String before
performing some operation on them, you might consider not creating large
copies of them at all. For example, in order to change one character of a
String, you must perform a data copy with your modification. If what you
really want is that String to appear to have that character changed through
the String contract (its methods), then it is an unfortunate limitation of
both java.lang.String and arrays in that they are restricted to an in-memory
representation of their backing data while exposing too much contract i.e.
not workaroundable (java.lang.CharSequence is a frivoleous attempt to
prevent this restriction). It would be better instead to have that data
arbitrarily represented in a type that provides an appropriate abstraction,
and some method that makes that abstraction appear as if a character were
changed, without enforcing that a data copy occurred. In fact, it would be
better if that data copy didn't occur and allow the client to decide when to
copy for performance reasons.

To illustrate an example, consider the following, more appropriate
abstraction of an array:
interface Array<E>
{
   int length();
   void set(E e, int index) throws ArrayIndexOutOfBoundsException;
   E get(int index) throws ArrayIndexOutOfBoundsException;
}

An implementation of this interface is not restricted to an in-memory
representation, but suppose it was anyway - for simplicity.
Now if I wanted to change the element at the nth index, a typical array
would force a copy with something like System.arraycopy.
Since you don't have that restriction here - since you have a more
appropriate abstraction - this enforcement does not hold.

You could write the following interface:
interface ReplaceNthElementOfArray
{
   <E> Array<E> replace(int n, E e, Array<E> a);
}
Implementations are not forced to perform a data copy in order to fulfill
the contract, since it has much less excess than the contract of a typical
array.

<plug shame="high">
I am not sure if all of this is even relevant to your question (I am a bit
vague on what you really want), but if it is, I can point you to
ContractualJ [http://contractualj.com], which has attempted to provide a
more appropriate abstraction of both arrays, and Strings by way of a
net.tmorris.adt.Sequence. In the case of an ordered sequence of characters,
I prefer to use a Sequence<Character> along with a
net.tmorris.primitives.Charable over a java.lang.String, since this allows
me to perform operations on the contract without the intrinsic excess
contract of most (all?) Java types. As a consequence, this often results in
significant performance improvements - no need to copy data due to excess
type contract.
</plug>

--
Tony Morris
http://tmorris.net/
Java Questions and Answers
http://jqa.tmorris.net/
Roedy Green - 14 Jan 2006 06:25 GMT
>If you are doing small modifications to the class java.lang.String before
>performing some operation on them, you might consider not creating large
>copies of them at all

I have a number of applications that  process a file in one fell
swoop.

1. seek out the pattern <!-- macro  XXXX  parm1 parm2 ... -->and
expand the macros, replacing any previous expansion. The macros are
written in Java.

2. compact removing whitespace from HTML that has no effect on
rendering.

3. clean up the "see" section at the bottom of each page, sorting
links, and resolving words into links.

4. remove excess blank lines

5. align in columns.

6. spaces to tabs

7. tabs to spaces

8. search for <DT entries  and extract items for indexing.

9. convert & to &amp; appropriately

10. remove duplicate lines

11. reflow text to standard line length.

12. split files into pieces based on embedded markers.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Tony Morris - 13 Jan 2006 20:33 GMT
> >If you are doing small modifications to the class java.lang.String before
> >performing some operation on them, you might consider not creating large
[quoted text clipped - 33 lines]
> Canadian Mind Products, Roedy Green.
> http://mindprod.com Java custom programming, consulting and coaching.

That sounds very much to me like you do indeed need a more appropriate
abstraction than java.lang.String, since performing those operations
individually will incur a performance hit by way of a data copy each time
(as each String instance is created). You could of course, try to do
something clever with collections, but they too have issues related to
inappropriate abstractions. I don't want to plug myself again, but in my
opinion, I cannot find a better recommendation - except of course, write
such an abstraction yourself.

Tony Morris
http://tmorris.net/
Java Questions and Answers
http://jqa.tmorris.net/
Stefan Ram - 14 Jan 2006 18:46 GMT
>In the case of an ordered sequence of characters, I prefer to
>use a Sequence<Character> along with a
[quoted text clipped - 4 lines]
>improvements - no need to copy data due to excess type
>contract.

 It somewhat reminds me of the way I implement "toString" in
 "de.dclj.ram.type.tuple.ComparableTuple0".

 Why do I have to convert my CharSequence to a String if the
 client would be happy with a CharSequence?

 Therefore, I offer "toCharSequence" for all clients:

/** {@inheritDoc} */
public java.lang.CharSequence toCharSequence()
{ java.lang.CharSequence result = "()";
 if( this.size > 0 )
 { final StringBuilder builder = new StringBuilder();
   /* ... */
   result = builder; }
 return result; }

 An then "toString" for clients who really need
 a string:

/** {@inheritDoc} */
public java.lang.String toString()
{ return this.toCharSequence().toString(); }

 (The full code is avaiable at:
http://www.purl.org/stefan_ram/html/ram.jar/de/dclj/ram/type/tuple/ComparableTup
le.html

http://www.purl.org/stefan_ram/html/ram.jar/de/dclj/ram/type/tuple/ComparableTup
le0.html

http://www.purl.org/stefan_ram/java/de/dclj/ram/type/tuple/ComparableTuple.java
http://www.purl.org/stefan_ram/java/de/dclj/ram/type/tuple/ComparableTuple0.java
 )
Robert Klemme - 16 Jan 2006 09:56 GMT
> Consider a program where you read a file into RAM in one great String,
> process it, create a new version, then if it is different, write it
[quoted text clipped - 3 lines]
> arrays for the I/O, a huge StringBuilder and  two huge String objects
> then discarding them for every file.

You don't need those huge byte arrays if you do conversion during reading
from the stream.

> Ideally there should be some sort of mini GC where just these objects
> are reclaimed, leaving the rest alone.
>
> A generational collector obviously helps. Is there anything simple I
> can do to boost this along.

I agree with Tony, you should think of a different representation of your
file contents in memory.  This soulds like a typical application of a
parser.  You might get away with a scanner that hacks your input into
tokens though.

HTH

   robert
Roedy Green - 16 Jan 2006 11:33 GMT
>I agree with Tony, you should think of a different representation of your
>file contents in memory.  This soulds like a typical application of a
>parser.  You might get away with a scanner that hacks your input into
>tokens though.

The giant string works well.  I do the i/o as fast as theoretically
possible.  You can't beat indexof for for finding the parts of the
string I have to process.  This way I can really rip through a large
number of files quickly.

My biggest file is only about half a megabyte and most are much
smaller.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Alun Harford - 17 Jan 2006 18:31 GMT
> >I agree with Tony, you should think of a different representation of your
> >file contents in memory.  This soulds like a typical application of a
[quoted text clipped - 8 lines]
> My biggest file is only about half a megabyte and most are much
> smaller.

Why not read it into a StringBuffer, change it, write it back? (you can even
recycle the StringBuffer if you want). StringBuffer has indexof, which is
still nice and fast.

Alun Harford
Roedy Green - 17 Jan 2006 20:50 GMT
On Tue, 17 Jan 2006 18:31:24 -0000, "Alun Harford"
<usenet@alunharford.co.uk> wrote, quoted or indirectly quoted someone
who said :

>Why not read it into a StringBuffer, change it, write it back? (you can even
>recycle the StringBuffer if you want). StringBuffer has indexof, which is
>still nice and fast.

I am not changing it in the sense of overlaying a few pieces..  I am
taking pieces of it, and replacing other pieces with a different
length piece.  The most efficient way to do that is to create a new
StringBuilder, rather than deleting and inserting from the old one, at
least in terms of CPU to slide to make room for the pieces.  It should
work with less RAM though with your method, so long as StringBuilder
does not go allocating temp space to do its sliding.

I'll put that idea on my todo stack to experiment with to see what the
net effect is.  The program toggles back and forth between consuming
35% and 65% of the CPU cycles on an idle machine. I don't know if that
is mostly in my algorithm or in GC. GC should be quick. There are that
many live objects other than classes.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Roedy Green - 17 Jan 2006 21:21 GMT
On Tue, 17 Jan 2006 20:50:01 GMT, Roedy Green
<my_email_is_posted_on_my_website@munged.invalid> wrote, quoted or
indirectly quoted someone who said :

>I'll put that idea on my todo stack to experiment with to see what the
>net effect is.  The program toggles back and forth between consuming
>35% and 65% of the CPU cycles on an idle machine. I don't know if that
>is mostly in my algorithm or in GC. GC should be quick. There are that
>many live objects other than classes.

Your algorithm is actually much better than I first thought. Most of
the time when I expand a macro, the expansion is the same as last
time.  There is no need to replace anything, and even if I did, it
would be the same length as the old expansion, and hence
StringBuilder. replace should not need to shift anything.

So long as my estimate for StringBuilder size is good, this should cut
my memory use in half. Thanks!  I never would have thought of this on
my own.

Perhaps I can do this up brown by recycling the StringBuilder. I
noticed there is no StringBuilder.clear or setSize.  I wonder what the
most  efficient way to clear for reuse is.

I need to chew on the problem of deciding when to strink a
StringBuilder. There is no point is keeping it huge just because one
file was huge.  Maybe I could simply process the files in ascending
size order.

To make this even faster, I would need a way to do I/O direct to/from
the char[] in the StringBuilder.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Roedy Green - 17 Jan 2006 21:37 GMT
On Tue, 17 Jan 2006 21:21:21 GMT, Roedy Green
<my_email_is_posted_on_my_website@munged.invalid> wrote, quoted or
indirectly quoted someone who said :

>Perhaps I can do this up brown by recycling the StringBuilder. I
>noticed there is no StringBuilder.clear or setSize.  I wonder what the
>most  efficient way to clear for reuse is.

it is StringBuilder.setLength

See http://mindprod.com/jgloss/stringbuilder.html
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Thomas Weidenfeller - 18 Jan 2006 08:49 GMT
> The giant string works well.  I do the i/o as fast as theoretically
> possible.  You can't beat indexof for for finding the parts of the
> string I have to process.

From you original description of the problem, where you wrote you look
for certain sequences of characters, I have to disagree! Like I
mentioned in previous discussions, look for Boyer-Moore,
Knuth-Morris-Pratt,  Rabin-Karp or similar algorithms. You can beat
linear, character by character text searches under certain conditions.

As for chopping the the data into pieces, and rebuild the stuff in
memory, have you considered to not copy the data around, but assemble
the result from "chunks" of the original data, and your changed data?

Your output would be described as a sequence of "chunks". Each chunks
either refers to original data (e.g. by an offset and length), or some
new data you just changed.

You have gathering write operations in nio, so writing these chunks
could be relatively fast. "Could", because it depends on the nio
implementation and the operating system. If the operating system
supports scatter/gather I/O, and if nio uses this, it might work and be
somewhat faster.

/Thomas
Signature

The comp.lang.java.gui FAQ:
ftp://ftp.cs.uu.nl/pub/NEWS.ANSWERS/computer-lang/java/gui/faq
http://www.uni-giessen.de/faq/archiv/computer-lang.java.gui.faq/

Roedy Green - 18 Jan 2006 10:20 GMT
On Wed, 18 Jan 2006 09:49:20 +0100, Thomas Weidenfeller
<nobody@ericsson.invalid> wrote, quoted or indirectly quoted someone
who said :

>You have gathering write operations in nio, so writing these chunks
>could be relatively fast.

It would be interesting to see how clever nio can get about copying.
It still has to decode and encode. decoding does not know how long the
result will be ahead of time. It makes a conservative guess and trims
the result, before creating the string.  FileReader thus dithers
horribly copying the array over and over and over before you get your
string.

Your gather algorithm is extremely memory efficient, especially if the
expansions for the most part are the same as last time -- the usual
case.

I wrote a Boyer Moore.  It was only marginally faster than indexOf.
Perhaps it could do better by working on a raw char[].

see http://mindprod.com/jgloss/products1.html#BOYER

I do quite a bit of this sort of programming, so I am interested in
the general case the best way to write such programs -- how to you get
the file in, modified and out with minimal fuss and RAM consumption.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.