Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / September 2006

Tip: Looking for answers? Try searching our database.

After deserialization program occupies about 66% more RAM

Thread view: 
setar - 18 Sep 2006 17:37 GMT
My program stores in RAM dictionary with about 100'000 words. This
dictionary occupies about 380MB of RAM. But when I serialize that dictionary
and then deserialize the program occupies about 620MB. Dictionary is the
only variable to which program has reference in the moment of serialization
and after deserialization.

Deserialization works correctly i.e. after deserialization I obtain the same
object of dictionary as I had before serialization (I can check it because I
can store dictionaries to text files and compare them - files representing
dictionaries before and after serialization are the same).
After deserialization I close stream which I used to read dictionary object
from file and run garbage collection.

Here are methods which I use to serialize and deserialize:
------------------------------------
public class Dictionary implements Serializable {
  ...
  public void serializeTo(String fileName) throws IOException {
     ObjectOutputStream out = new ObjectOutputStream(new
FileOutputStream(fileName));
     out.writeObject(this);
     out.close();
  }

  public static Dictionary deserializeFrom(String fileName) throws
IOException, ClassNotFoundException {
     ObjectInputStream in = new ObjectInputStream(new
FileInputStream(fileName));
     Dictionary dictionary = (Dictionary)in.readObject();
     //collator hasn't been serialized to file, so we must recreate it
manually
     dictionary.collator = Collator.getInstance(dictionary.getLocale());
     in.close();
     System.gc();
     return dictionary;
  }
}
------------------------------------

Anybody knows what can I do to decrease the amount of memory used after
deserialization?

Thanks for any hints.
Oliver Wong - 18 Sep 2006 17:41 GMT
> My program stores in RAM dictionary with about 100'000 words. This
> dictionary occupies about 380MB of RAM. But when I serialize that
> dictionary and then deserialize the program occupies about 620MB.
> Dictionary is the only variable to which program has reference in the
> moment of serialization and after deserialization.
[...]

> Anybody knows what can I do to decrease the amount of memory used after
> deserialization?

   Since it looks like the memory is approximately doubled, perhaps you now
have two copies of your dictionary object in memory? Did you release all
references to your first copy after serializing?

   - Oliver
setar - 18 Sep 2006 18:03 GMT
>    Since it looks like the memory is approximately doubled, perhaps you
> now have two copies of your dictionary object in memory? Did you release
> all references to your first copy after serializing?

It is not this problem because my program for deserialization testing looks
like this:
public static void main(String[] args) throws Exception {
  dd.library.dictionary.Dictionary dictionary
     =
dd.library.dictionary.Dictionary.deserializeFrom("dictionary.serialize");
  int i = 0; //here is a breakpoint where I check the amount of memory
               //used after deserialization
}

I use a new program and there is only one variable - one dictionary.
Thomas Hawtin - 18 Sep 2006 17:52 GMT
> My program stores in RAM dictionary with about 100'000 words. This
> dictionary occupies about 380MB of RAM. But when I serialize that dictionary
> and then deserialize the program occupies about 620MB. Dictionary is the
> only variable to which program has reference in the moment of serialization
> and after deserialization.

My guess is that it's the data held in the object streams that causes
the apparent increase in memory usage. How are you measuring it?

A memory profiler may well help. Even using a basic Sun J2SE 5.0 JDK,
for instance, you can use jmap -histo.

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

setar - 18 Sep 2006 18:40 GMT
> My guess is that it's the data held in the object streams that causes the
> apparent increase in memory usage. How are you measuring it?
>
> A memory profiler may well help. Even using a basic Sun J2SE 5.0 JDK, for
> instance, you can use jmap -histo.

But I close stream before measuring memory usage. I measure memory usage in
Windows task manager (I substract the amount of memory used by all programs
before run of my program from the amount of memory used by all programs
after building dictionary by my program).

I have used Java Memory Profiler (www.khelekore.org/jmp/) and it shows that
program objects use more or less the same amount of memory before
serialization and after deserialization (As I remember in Java Memory
Profiler there is no summary amount of memory. There is only summary amount
of memory used by objects of each class - I checked these clases which use
more than 100kB of memory).
setar - 18 Sep 2006 18:52 GMT
> I have used Java Memory Profiler (www.khelekore.org/jmp/) and it shows
> that
[quoted text clipped - 3 lines]
> amount of memory used by objects of each class - I checked these clases
> which use more than 100kB of memory).

Sorry, Java Memory Profiler shows a total amount of memory used by program,
and it is the same before serialization and after deserialization. I
measured it some days before when dictionary was smaller:
* before serialization:
- Windows task manager: 160MB
- Java Memory Profiler: 122.24MB (used by 3'212'143 objects)
* after deserialization:
- Windows task manager: 280MB
-  Java Memory Profiler: 120.38MB (used by 3'209'026 objects)
Thomas Hawtin - 18 Sep 2006 19:22 GMT
> But I close stream before measuring memory usage.

But the stream and all it's gubbins is for a moment in memory at the
same time as the entire dictionary. So the maximum allocated heap will
rise at that point (IIRC, in some circumstances it can be handed back to
the operating systems, but I don't know all the ins and outs of that).

Exactly what happens is likely to be version dependent. For instance, I
guess that pre-1.5 ObjectInputStream may create String objects with
oversized char arrays.

You maybe able to reduce the amount of memory consumed by using a
customised serial form. Key classes should define readObject and
writeObject, in which they should use, for instance, readUnshared and
writeUnshared.

>                                                   I measure memory usage in
> Windows task manager (I substract the amount of memory used by all programs
> before run of my program from the amount of memory used by all programs
> after building dictionary by my program).

Such measurements of memory are notoriously misleading.

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Eric Sosman - 19 Sep 2006 01:30 GMT
> My program stores in RAM dictionary with about 100'000 words. This
> dictionary occupies about 380MB of RAM.  [...]

    ... thus using an average of 3800 bytes per word!  What
are you storing: bit-map images of the printed text?

    Whatever it is, my advice is to spend no time at all
trying to tune and adjust and tweak a data structure that is
so grotesquely bloated.  Just throw it away and replace it
with something else -- an ArrayList<String> would be orders
of magnitude more efficient.

Signature

Eric Sosman
esosman@acm-dot-org.invalid

setar - 19 Sep 2006 09:42 GMT
>> My program stores in RAM dictionary with about 100'000 words. This
>> dictionary occupies about 380MB of RAM.  [...]
>
>     ... thus using an average of 3800 bytes per word!  What
> are you storing: bit-map images of the printed text?

I not only store text of words but also many more information about them,
for example: translation to english, synonyms, hypernyms, hyponyms
(ontology) and language. For each mentioned elements (they are actually
phrases of words not single words) I also store phrase parsed to component
words with information about type of connection between words and phase text
generated by concatenating parsed words (it can be different).
I will try to decrease amount of memory used by one word (phase) but I
estimated that on average one word must occupy at least 700 bytes.
Except of these I have three indices to be able to search words.
Robert Klemme - 19 Sep 2006 13:16 GMT
>>> My program stores in RAM dictionary with about 100'000 words. This
>>> dictionary occupies about 380MB of RAM.  [...]
[quoted text clipped - 10 lines]
> estimated that on average one word must occupy at least 700 bytes.
> Except of these I have three indices to be able to search words.

Serialization blows up strings.  You can see with the attached program
if used with a debugger (I tested with 1.4.2 and 1.5.0 with Eclipse).
You can see that (1) copies of strings do not share the char array any
more and (2) that the char array is larger than that of the original
even though only some characters are used (the latter is true for 1.4.2
only, so Sun actually has improved this).

Kind regards

    robert
Paul Davis - 19 Sep 2006 13:30 GMT
Not so sure about this test. By adding a similar println after the
first declaration, I get the same results as the deserialized area.
> >>> My program stores in RAM dictionary with about 100'000 words. This
> >>> dictionary occupies about 380MB of RAM.  [...]
[quoted text clipped - 47 lines]
>         Object[] a1 = { root, root.substring( 3 ) };
>         Object[] a2 = { root, root.substring( 3 ) };

       /* This produces the same results as the one below */
        System.out.println(a1 == a2);

        for (int i = 0; i < a1.length; ++i)
        {
            System.out.println(i + ": " + (a1[i] == a2[i]));
        }

>         ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
>         ObjectOutputStream objectOut = new ObjectOutputStream( byteOut );
[quoted text clipped - 21 lines]
>
> --------------000303000500040801080806--
Paul Davis - 19 Sep 2006 13:32 GMT
> >>> My program stores in RAM dictionary with about 100'000 words. This
> >>> dictionary occupies about 380MB of RAM.  [...]
[quoted text clipped - 73 lines]
>
> --------------000303000500040801080806--

Changing the code to actually show the internal reference shows that
the deserialized version produces the same results as the one before
serialization.
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;

public class SharingTest
{

    /**
    * @param args
    * @throws IOException in case of error
    * @throws ClassNotFoundException never
    */
    public static void main(String[] args)
        throws IOException, ClassNotFoundException
    {
        String root = "foobar";
        String[] a1 = { root, root.substring(3)};
        String[] a2 = { root, root.substring(3)};
        System.out.println(a1 == a2);

        for (int i = 0; i < a1.length; ++i)
        {
            System.out.println(i + ": " + (a1[i].intern() == a2[i].intern()));
        }

        ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
        ObjectOutputStream objectOut = new ObjectOutputStream(byteOut);

        objectOut.writeObject(a1);
        objectOut.writeObject(a2);

        objectOut.close();

        ByteArrayInputStream byteIn =
            new ByteArrayInputStream(byteOut.toByteArray());
        ObjectInputStream objectIn = new ObjectInputStream(byteIn);

        String[] c1 = (String[])objectIn.readObject();
        String[] c2 = (String[])objectIn.readObject();
        System.out.println("-----------------------------------");
        // breakpoint here
        System.out.println(c1 == c2);

        for (int i = 0; i < c1.length; ++i)
        {
            System.out.println(i + ": " + (c1[i].intern() == c2[i].intern()));
        }
    }

}
Robert Klemme - 19 Sep 2006 13:56 GMT
> Changing the code to actually show the internal reference shows that
> the deserialized version produces the same results as the one before
> serialization.

What exactly do you mean by "same results"?  Of course string values
remain the same.  I was talking about internal representation (i.e. the
char arrays used).  You cannot see that with a Java program alone, you
need a memory profiler or a debugger to actually see those instances and
determine which are identical and which not.

Also, using String.intern() completely changes the semantics memory
wise.  Of course the comparison returns true because it is actually the
same instance (and thus also the same char[] internally).  My point was
that if strings are constructed from each other then serializing and
deserializing can seriously affect memory usage because of the changed
internal representation (no more sharing of char[]).

Using intern() might also be a bad idea for changing data because
interned strings will continuously increase the VM's memory.  This might
not be an issue for short lived applications but it certainly can be for
long running apps.

Regards

    robert
Paul Davis - 19 Sep 2006 14:24 GMT
> > Changing the code to actually show the internal reference shows that
> > the deserialized version produces the same results as the one before
> > serialization.
>
> What exactly do you mean by "same results"?  Of course string values
I apologize for being unclear, by same results, I meant that:

System.out.println(a1 == a2);
for (int i = 0; i < a1.length; ++i)
{
   System.out.println(i + ": " + (a1[i] == a2[i]));
}

produced the same output as:

System.out.println(c1 == c2);
for (int i = 0; i < c1.length; ++i)
{
   System.out.println(i + ": " + (c1[i] == c2[i]));
}

meaning that there is no difference between the original values and the
deserialized ones.
> remain the same.  I was talking about internal representation (i.e. the
> char arrays used).  You cannot see that with a Java program alone, you

The intern() method returns a reference to the internal reference used
by the string object (according to the javadoc anyway).

> need a memory profiler or a debugger to actually see those instances and
> determine which are identical and which not.
[quoted text clipped - 10 lines]
> not be an issue for short lived applications but it certainly can be for
> long running apps.

I agree the intern() method should probably never be used. I was using
it here to demonstrate that the objects were pointing to the same
reference internally.

> Regards
>
>     robert

Please forgive but, I don't understand what the example is trying to
demonstrate when the tests performed on the deserialized objects
produce the same output as the tests on the original objects.
  false
  0: true
  1: false
  -----------------------------------
  false
  0: true
  1: false
Robert Klemme - 19 Sep 2006 14:49 GMT
>>> Changing the code to actually show the internal reference shows that
>>> the deserialized version produces the same results as the one before
[quoted text clipped - 15 lines]
>     System.out.println(i + ": " + (c1[i] == c2[i]));
> }

Ok, now I understand.  But that was not the main point of that piece of
code.

> meaning that there is no difference between the original values and the
> deserialized ones.

With regard to internal relationships between instances, yes.  But
deserialized instances are differently set up with regard of size and
sharing of the internal buffer.

>> remain the same.  I was talking about internal representation (i.e. the
>> char arrays used).  You cannot see that with a Java program alone, you
>
> The intern() method returns a reference to the internal reference used
> by the string object (according to the javadoc anyway).

I am not sure I fully agree, there is no such thing as an "internal
reference".  "interned reference" is probably a bit better.
String.intern() will either return the same ref and store it in its
internal map (or whatever representation Sun chose) or you get a
reference to another instance representing an equivalent string but
already present in the internal data structure.

Quote:

A pool of strings, initially empty, is maintained privately by the class
String.

When the intern method is invoked, if the pool already contains a string
equal to this String object as determined by the equals(Object) method,
then the string from the pool is returned. Otherwise, this String object
is added to the pool and a reference to this String object is returned.

>> Using intern() might also be a bad idea for changing data because
>> interned strings will continuously increase the VM's memory.  This might
[quoted text clipped - 4 lines]
> it here to demonstrate that the objects were pointing to the same
> reference internally.

Not exactly: you interned the strings after deserialization and thus it
comes at no surprise that they point to the same instance after you did
that.

> Please forgive but, I don't understand what the example is trying to
> demonstrate when the tests performed on the deserialized objects
> produce the same output as the tests on the original objects.

As said, that output was not the main point.  As I wrote above, set a
breakpoint at the line indicated and then look at object identities.
Then you'll see what I mean and try to convey from the beginning.

Kind regards

    robert
Eric Sosman - 20 Sep 2006 02:22 GMT
>>>My program stores in RAM dictionary with about 100'000 words. This
>>>dictionary occupies about 380MB of RAM.  [...]
[quoted text clipped - 11 lines]
> estimated that on average one word must occupy at least 700 bytes.
> Except of these I have three indices to be able to search words.

    Thanks for the more complete description.  It could be
(I can't tell; your description is still only partial) that
it's the "other" data that's inflating the size when you
serialize and deserialize.  Perhaps a memory profiler could
point out the pieces of the data structure that grow unusually
large when you do this.

Signature

Eric Sosman
esosman@acm-dot-org.invalid

Chris Uppal - 19 Sep 2006 10:49 GMT
> My program stores in RAM dictionary with about 100'000 words. This
> dictionary occupies about 380MB of RAM. But when I serialize that
> dictionary and then deserialize the program occupies about 620MB.
> Dictionary is the only variable to which program has reference in the
> moment of serialization and after deserialization.

Like Thomas, I rather suspect that you may be misinterpreting what you are
seeing.  One simple test would be to deserialise the same data say 10 times (or
as many times as you can without running out of memory[*]) and keep there
results in some array (so they don't get reclaimed). Use a new
ObjectInputStream each time.  If the memory used keeps going up by the same
unexpectedly large amount each time, then you'll know the problem is real.

([*] you'll probably have to use a somewhat smaller dictionary for these
tests.)

If the problem is real, then one thing I'd check is the way that String sharing
is affecting your application.  If you have one long String and then create
many substrings from that, then the substrings will share the internal char[]
array of the main String.  If you serialise all the strings (including the
original one) and then deserialise then the sharing will be lost, and so you'll
increase the overall amount of memory used.

   -- chris
setar - 19 Sep 2006 12:55 GMT
I've installed evaluation version of JProfiler. I will check everything and
I will write later.
Paul Davis - 19 Sep 2006 13:33 GMT
> I've installed evaluation version of JProfiler. I will check everything and
> I will write later.
Could you post the code?
It might be easier to evaluate something that we can actually see.
:-)
vladimirkondratyev@yahoo.com - 20 Sep 2006 06:45 GMT
It looks like there is a memory leak (in your code or somewhere inside
Java core classes). I recommend you to YourKit Java Profiler
http://www.yourkit.com for memory analyzes. The "Biggest Objects"  or
"Class Tree" tools immediately will show the objects that retain the
most of the memory.

BR, Vladimir
setar - 20 Sep 2006 16:51 GMT
Ok. Here are results:)
After deserialization program occupies more memory, but large part of heap
is free (it isn't used by any object). Llike Thomas wrote it is probably
caused by this that during deserialization stream occupies large part of
heap and when I close the stream this part becomes free, but it isn't
returned to operating system.
More exactly, I must run garbage collection about three times if I want the
amount of used heap to return to state nearly the same as before
serialization.
I also compared number of object of each class and amount of heap used by
them before serialization and after deserialization. For most classes the
numbers of objects are the same or nearly the same. But after
deserialization there is 7% more objects of char[] (I don't use them - they
are only used internally by String), but they occupy 2MB less memory:) As
Robert wrote I think it could be so because in dictionary before
serialization part of strings share char[] tables with other strings. After
deserialization for these strings new char[] tables will be created (so the
number of them will grow). But also, char[] tables that existed before
serialization only because sharing (they were no other references to them)
won't be created. Because oryginal char[] tables were longer than new tables
they can occupy less memory than before serialization.

Thanks all!
Robert Klemme - 20 Sep 2006 19:31 GMT
> Ok. Here are results:)
> After deserialization program occupies more memory, but large part of heap
[quoted text clipped - 19 lines]
>
> Thanks all!

Thank /you/ for sharing these results.

    robert


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.