My project uses XML for its data files and I am using a DOM parser
(the one native to the JDK) to parse out the files. DOM is especially
useful because the project lends itself to the use of trees.
Unfortunately, there tends to be a limit as to how big the XML files
can be before the DOM parser starts chewing up memory and, if the file
is big enough, I get an OutOfMemoryException. It's not from the
project specifically - it's instead a result of the enormous amount of
space DOM takes up.
I was wondering if there's a solution to this? I have read about SAX
a bit, and although it would fix the OOMEx. it would make it more
difficult to manage the tree structure. I could also increase the
amount of RAM available to the JRE, but I'd rather do that as a last
resort.
Does anybody have any other suggestions? Thanks.
Arne Vajhøj - 21 Feb 2008 02:07 GMT
> My project uses XML for its data files and I am using a DOM parser
> (the one native to the JDK) to parse out the files. DOM is especially
[quoted text clipped - 13 lines]
>
> Does anybody have any other suggestions?
No.
-Xmx seems as the best way to go.
Arne
Jason Cavett - 21 Feb 2008 15:53 GMT
> > My project uses XML for its data files and I am using a DOM parser
> > (the one native to the JDK) to parse out the files. DOM is especially
[quoted text clipped - 19 lines]
>
> Arne
Haha. Alright. I was sort of hoping that wasn't the solution, but if
that's what has to be done, that's what I'll do.
Thanks.
Boris Stumm - 21 Feb 2008 08:51 GMT
> I was wondering if there's a solution to this? I have read about SAX
> a bit, and although it would fix the OOMEx. it would make it more
> difficult to manage the tree structure. I could also increase the
> amount of RAM available to the JRE, but I'd rather do that as a last
> resort.
Maybe have a look at XML databases. I am not really into this matter,
but I know some guys in my working group have one that is accessible with
DOM. There should be others, too. The problem will be to find one which
is stable enough for production use.
Stanimir Stamenkov - 21 Feb 2008 23:02 GMT
Wed, 20 Feb 2008 17:19:32 -0800 (PST), /Jason Cavett/:
> My project uses XML for its data files and I am using a DOM parser
> (the one native to the JDK) to parse out the files. DOM is especially
[quoted text clipped - 13 lines]
>
> Does anybody have any other suggestions? Thanks.
Great deal of the DOM is usually taken by whitespace in element
content (used only to format the source XML text). Depending on the
parser implementation you could supply a DTD to make the parser
ignore [1] the whitespace in element content, or use custom
filtering [2] as provided by the DOM Level 3 Load and Save APIs and
implementation part of the standard Java 1.5 framework.
The Xerces2 implementation (modified version of which is part of the
Sun's Java 1.5 distribution) is capable of ignoring whitespace in
element content when a suitable DTD is provided even in
non-validating mode. One could supply a DTD for documents which
don't have a DOCTYPE declaration setting an EntityResolver2 [3] (see
the getExternalSubset() method) instance to the DocumentBuilder [4].
All the above stuff is also available to Java 1.4 users simply by
plugging the latest Xerces2 jars into the classpath.
[1]
<http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory
.html#setIgnoringElementContentWhitespace(boolean)>
[2]
<http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/ls/LSParserFilter.html>
[3]
<http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/ext/EntityResolver2.html>
[4]
<http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilder.html#s
etEntityResolver(org.xml.sax.EntityResolver)>

Signature
Stanimir