Java Forum / General / March 2007
Get "java.lang.OutOfMemoryError" when Parsing an XML useing DOM
NeoGeoSNK - 23 Mar 2007 06:12 GMT Hello, I just write a XML parsing tool use java Dom parser, It works fine when parsing small XML files, but when I parsing a over 500000 lines XML file, it throws an "java.lang.OutOfMemoryError" Exception at line 4.
1: File f = new File(filename); 2: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); 3: DocumentBuilder builder = factory.newDocumentBuilder(); 4: Document doc = builder.parse(f);
However, I don't want to use other XML parsers such as "SAX" because I must rewrite most of my codes :'( Below is the Syntax of the XML file:
<journal> <record type="1" id="275"> <header> <header_generic>
</header_generic> <header_specific_user> </header_specific_user> </header> <body> <frame frame_id="200011"> <attribute type="STRING"> ...........
</attribute> </frame> ............
</body> </record> </journal>
Is there somebody give me some suggestions?
Thanks and Best Regards!
Andrew Thompson - 23 Mar 2007 06:20 GMT > Hello, > I just write a XML parsing tool use java Dom parser, It works fine > when parsing small XML files, but when I parsing a over 500000 lines > XML file, it throws an "java.lang.OutOfMemoryError" Exception .. Note that as the quoted text clearly states, this is an *Error*, not an *Exception*. This is an important distinction if attempting to catch the result.
Have you tried increasing the memory available to the application?
Andrew T.
NeoGeoSNK - 23 Mar 2007 07:57 GMT > > Hello, > > I just write a XML parsing tool use java Dom parser, It works fine [quoted text clipped - 10 lines] > > Andrew T. Thanks very much, I just increaseing the memory availble to 1Gb(java -Xmx1024m) But It haven't finished the work from now, do you know how to calculate the time and memory consumed?
NeoGeoSNK - 23 Mar 2007 08:09 GMT > > Hello, > > I just write a XML parsing tool use java Dom parser, It works fine [quoted text clipped - 5 lines] > is an important distinction if attempting to > catch the result. Thanks, I remember I have heard before that Exception is the only error handle mechanism of Java? and the error log on another PC list below is different from mine:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Unknown Source) at java.lang.String.<init>(Unknown Source) at com.sun.org.apache.xerces.internal.xni.XMLString.toString(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractDOMParser.characters(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) at ParsingLog.parsing(ParsingLog.java:21) at Log2XML.main(Log2XML.java:12)
BRs Ning Yu.
John W. Kennedy - 23 Mar 2007 19:56 GMT >>> Hello, >>> I just write a XML parsing tool use java Dom parser, It works fine [quoted text clipped - 9 lines] > mechanism of Java? > and the error log on another PC list below is different from mine: Unfortunately, when speaking of Java, the word "exception" is used in more than one way. A thing that can be thrown and caught is often called an "exception", but the correct name is "Throwable". Throwables are divided into two groups, the "Error" group and the "Exception" group. The difference is that an Error normally represents a disaster, such as running out of memory, that a program should not normally try to (or be able to) recover from.
Therefore, your "it throws an 'java.lang.OutOfMemoryError' Exception" is a whuzzat, like saying "a Pennsylvania Canadian".
 Signature John W. Kennedy A proud member of the reality-based community. * TagZilla 0.066 * http://tagzilla.mozdev.org
NeoGeoSNK - 24 Mar 2007 04:58 GMT > >>> Hello, > >>> I just write a XML parsing tool use java Dom parser, It works fine [quoted text clipped - 27 lines] > > - Show quoted text - I think the "Exception" your mean is the Excpetion class which extends the java.lang.Throwable, but here I talk about is the Java error handle mechanism ,so I think Exception is an excepiton, error is an exception, and throwable is an exception too. by the way, "Exception in thread "main" java.lang.OutOfMemoryError: Java heap space" is reported by the JVM, not I said:)
-- Ny
Mike Schilling - 23 Mar 2007 06:52 GMT > Hello, > I just write a XML parsing tool use java Dom parser, It works fine [quoted text clipped - 10 lines] > However, I don't want to use other XML parsers such as "SAX" because I > must rewrite most of my codes : DOM creates an object for each feature (element, attribute, text, etc.) of the XML document. A bigger document occupies more memory. If you're going to construct DOMs for huge documents, you'll need to give the JVM more memory.
If you don't need to keep the entire document in memory (say, if you process each element and cease to need it after it's processed), then SAX or a pull parser would be far better choices.
NeoGeoSNK - 23 Mar 2007 07:53 GMT On Mar 23, 1:52 pm, "Mike Schilling" <mscottschill...@hotmail.com> wrote:
> > Hello, > > I just write a XML parsing tool use java Dom parser, It works fine [quoted text clipped - 19 lines] > each element and cease to need it after it's processed), then SAX or a pull > parser would be far better choices. Thanks very much, I just use the java -Xmx1024m option to allocated 1GB memory to JVM, but 40 minutes from now, it haven't work out the XML file :'(
Andrew Thompson - 23 Mar 2007 08:23 GMT > On Mar 23, 1:52 pm, "Mike Schilling" <mscottschill...@hotmail.com> ..
> > ..SAX or a pull > > parser would be far better choices. * ..
> I just use the java -Xmx1024m option to allocated 1GB memory to JVM, > but 40 minutes from now, it haven't work out the XML file Another 20 minutes and it becomes an 'incomputable problem' according to the definition as I vaguely recall..
* Sounds as though the task might be better achieved using the optimal tools for the job, rather than try to 'work around' the problems of parsing the entire document using DOM.
Andrew T.
NeoGeoSNK - 23 Mar 2007 09:00 GMT > > On Mar 23, 1:52 pm, "Mike Schilling" <mscottschill...@hotmail.com> > .. [quoted text clipped - 18 lines] > > Andrew T. Thanks, I can't wait any more time, the job is take nearly 2 hours but haven't finished yet.I think I'll try the SAX api, is there more fast api to parsing XML in java?
John W. Kennedy - 23 Mar 2007 20:00 GMT >>> On Mar 23, 1:52 pm, "Mike Schilling" <mscottschill...@hotmail.com> >> .. [quoted text clipped - 21 lines] > finished yet.I think I'll try the SAX api, is there more fast api to > parsing XML in java? SAX won't necessarily be /faster/ -- it could be a lot slower. It depends on what you're doing.
Are you page-thrashing? If so, than SAX is definitely a good idea.
 Signature John W. Kennedy "...if you had to fall in love with someone who was evil, I can see why it was her." -- "Alias" * TagZilla 0.066 * http://tagzilla.mozdev.org
Lew - 23 Mar 2007 22:43 GMT > SAX won't necessarily be /faster/ [than DOM] -- it could be a lot slower. It > depends on what you're doing. > > Are you page-thrashing? If so, than SAX is definitely a good idea. Another way SAX can really speed things up is that you can use it to handle an entire XML document in a single pass without huge memory structures to build and traverse. Back when Java 1.2 first came out I was on a project that used Java and SAX to parse largish XML documents over the network and it ran like a bat out of heck. With modern network tech (gigabit LAN, ...), today's processors and the improvements in Java it would truly scream.
It sounds like the OP's DOM tree is too large to process efficiently. SAX, correctly used, would almost certainly create a huge speed improvement - like from 2 hours-infinite down to about a second or two, I would guess.
Like JWK said, it really depends on what you do with the parsed data. Additional I/O (writing the parsed data to a file or DBMS), large auxiliary memory structures and other factors could kill the speedup.
-- Lew
NeoGeoSNK - 24 Mar 2007 04:21 GMT > > SAX won't necessarily be /faster/ [than DOM] -- it could be a lot slower. It > > depends on what you're doing. [quoted text clipped - 17 lines] > > -- Lew Thanks, I just want to transfer the original XML to another XML, the original is a log of subscribers, I ectract and return a set of these subscribers and build a new sturcture XML.
NeoGeoSNK - 24 Mar 2007 04:05 GMT > >>> On Mar 23, 1:52 pm, "Mike Schilling" <mscottschill...@hotmail.com> > >> .. [quoted text clipped - 34 lines] > > - Show quoted text - what "...if you had to fall in love with someone who was evil, I can see why it was her." means? I don't know how DOM works when it parsing a XML, I use DOM that is because the XPath can quciky location some particular elements. I think if the SAX only reports events but not store the whole structure of XML like DOM does, It must be more efficient. What does "page- thrashing" means ? I paste the source of the code:) FYI
public Set parsing(String filename) throws Exception{ Set subset = new LinkedHashSet(); File f = new File(filename); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(f); Element root = doc.getDocumentElement(); XPathFactory xpfactory = XPathFactory.newInstance(); XPath path = xpfactory.newXPath(); NodeList recoredlist = (NodeList)path.evaluate("/journal/record", doc, XPathConstants.NODESET); // System.out.println("frameIdlist.getLength()= " + recoredlist.getLength()); //enumerate all record in a log for(int i = 0; i < recoredlist.getLength(); i ++){ // System.out.println("recoredlist = " + recoredlist.item(i)); Node record = recoredlist.item(i); Element recordelement = (Element)record; //System.out.println(recordelement.getTagName()); //get operat type String BEtype = (String)path.evaluate("header/header_generic/domain/ @value", recordelement); // System.out.println("operation type = " + BEtype); if(!BEtype.equals("SHLR::Subscription")) continue; SubInfo subscriber = new SubInfo(); NodeList framelist = (NodeList)path.evaluate("body/frame", recordelement, XPathConstants.NODESET); // System.out.println("framelist = " + framelist.getLength()); //enumerate frame list in a record for(int j = 0; j < framelist.getLength(); j++){ // System.out.println("frame = " + framelist.item(j)); NodeList attriblist = (NodeList)path.evaluate("attribute/ attribute_value/string/@value", framelist.item(j), XPathConstants.NODESET); for(int k = 0; k < attriblist.getLength(); k++){ //System.out.println(attriblist.item(k)); //System.out.println(attriblist.item(k).getClass()); Node attribute = attriblist.item(k); String value = attribute.getNodeValue(); //String value = att.getAttribute("Value"); // System.out.println("Value = " + value); if(value.equals("create")){ subscriber.setModifier("create"); }else{ if(value.equals("modify")){ subscriber.setModifier("modify"); }else{ if(value.equals("delete")){ subscriber.setModifier("delete"); }else{ if(value.trim().matches("dirNumberId.*")){ //System.out.println("dirNumberId = " + value); String dirnumber = value.substring(value.indexOf("dirNumberId=") + 12, value.indexOf(",sHLRSubsOrganizationId")); String ndc = value.substring(value.indexOf("nDCId=") + 6, value.indexOf(",managedElementId=SHLR")); // System.out.println("dirnumber=" + dirnumber + ndc); subscriber.setNDCId(ndc); subscriber.setdirNumberId(dirnumber); }else{ if(value.equals("calledList")){ Node calledattr = attriblist.item(k + 1); String calledvalue = calledattr.getNodeValue(); // System.out.println("calledList = " + calledvalue); if(calledvalue.equals("NULL")) subscriber.removeCalledList(); else subscriber.addCalledList(calledvalue); }else{ if(value.equals("callingList")){ Node callingattr = attriblist.item(k + 1); String callingvalue = callingattr.getNodeValue(); // System.out.println("callingList = " + callingvalue); if(callingvalue.equals("NULL")) subscriber.removeCallingList(); else subscriber.addCallingList(callingvalue); }else{ if(value.equals("lRNumberId")){ Node lrnattr = attriblist.item(k + 1); String lrnvalue = lrnattr.getNodeValue(); subscriber.setlrnNumberId(lrnvalue);
} } }
} } } } } } if(subscriber != null) subset.add(subscriber); }
return subset; }
Andrew Thompson - 24 Mar 2007 04:44 GMT ..
> > -- > > John W. Kennedy > > "...if you had to fall in love with someone who was evil, I can see why > > it was her." > > -- "Alias" ...
> what "...if you had to fall in love with someone who was evil, I can > see why > it was her." means? Note that it was not connected to the technical part of the conversation, it is just part of a 'sig.' or 'signature line'. Sig.s are often intended to be humorous, or funny, and that is just one such line. Other people's sig.'s might push points of view that the person is particularly fond of, or to simply add details of themselves, or their own web sites, or links of interest to them.
I generally prefer the 'funny' sig.s - most other sig.s take themselves far too seriously.
(Note also that it is generally a good idea to trim sig.s when replying, as the relevant information ('who posted what') is still contained in the 'Jim wrote: ..' attribution lines above the text.)
Andrew T.
NeoGeoSNK - 24 Mar 2007 13:41 GMT > .. > > > -- [quoted text clipped - 27 lines] > > Andrew T. Thanks Andrew T I just uesed the SAX to rewrite the code, and the performance increased a lot,To my surprise, the DOM parsing the XML will consume more than 6 hours, but the SAX take 6 seconds only:), I think the DOM can't paring a XML file which more than 100000 lines without throw an memory exception,I think there would be no argument about the speed of these two parsers. When use DOM, I must load the whole XML in to memory, Document doc = builder.parse(file); this will become impossible when the file is too large.
I realy can't understand about the 'signature line' you explained? I think it's more complex than the XML parser and Java:) I guess the ".)" is a 'funny' sig.s of you ?
Andrew Thompson - 24 Mar 2007 14:09 GMT (big trim)
> Thanks Andrew T Well, ..for what ever I've done 'your welcome', but most of the best suggestions in this thread came from other people! AFAIR it was Mike S. that first suggested the much better strategy of using SAX.
> I just uesed the SAX to rewrite the code, and the performance > increased a lot,To my surprise, the DOM parsing the XML will consume > more than 6 hours, but the SAX take 6 seconds only:), Hmm... That is quite an impressive difference, isn't it? Lew's estimate was not far off (I did not comment at the time - but I really thought his statement of '2 hour -> 1 to 2 seconds' was unrealistic!).
> I realy can't understand about the 'signature line' you explained? I > think it's more complex than the XML parser and Java:) It is both more complicated, and far less important, but I do not quite understand what you mean - if you need further information, please write your question a little differently (I do not understand your *question*).
On the other hand, I recommend forgetting the sig. - it is really not that important.
By the way - I am glad you solved the technical problem. :-)
Andrew T.
Lew - 24 Mar 2007 16:31 GMT "NeoGeoSNK" <ny1...@gmail.com> wrote:
>> I just uesed the SAX to rewrite the code, and the performance >> increased a lot,To my surprise, the DOM parsing the XML will consume >> more than 6 hours, but the SAX take 6 seconds only:),
> Hmm... That is quite an impressive difference, > isn't it? Lew's estimate was not far off (I did > not comment at the time - but I really thought > his statement of '2 hour -> 1 to 2 seconds' was > unrealistic!). Oh, ye of little faith! :-)
It would've been fine with me if I were wrong - I have been proven wrong in this forum several times before. I just know how fast a good SAX implementation can be, went out on a limb and was right this time.
I wonder if there weren't a particular problem with the DOM implementation, though. Others in this thread have had better success with a DOM approach than the OP did.
-- Lew
Tom Hawtin - 24 Mar 2007 17:56 GMT > I wonder if there weren't a particular problem with the DOM > implementation, though. Others in this thread have had better success > with a DOM approach than the OP did. Possibly something to do with the form of the XML being used. IIRC, there is something about handling of attributes that can make DOM very slow. It's also going to be somewhat implementation dependent.
Tom Hawtin
Andrew Thompson - 25 Mar 2007 03:51 GMT > "NeoGeoSNK" <ny1...@gmail.com> wrote: > >> I just uesed the SAX to rewrite the code, and the performance [quoted text clipped - 6 lines] > > Oh, ye of little faith! :-) Damn faith! Give me run-time results, anyday! ;-) (If you had stated it as 'code I worked on, improved ...' I would have been prepared to accept it at face value..)
Andrew T.
NeoGeoSNK - 26 Mar 2007 03:35 GMT > > "NeoGeoSNK" <ny1...@gmail.com> wrote: > > >> I just uesed the SAX to rewrite the code, and the performance [quoted text clipped - 13 lines] > > Andrew T. Hello Andrew T I just send my tool including the Log files "log_R2.2.xml" in a jar to you, please check your mailbox.)
Ny
Andrew Thompson - 26 Mar 2007 09:51 GMT ...
> > Damn faith! Give me run-time results, anyday! ;-) ...
> Hello Andrew T > I just send my tool including the Log files "log_R2.2.xml" in a jar to > you, please check your mailbox.) Thanks. But in fact, although my comment above seemed to invite you to do that, I do not actually need folks from usenet to send me code. More specifically, unless email from usenet includes the word 'consultancy', it automatically gets deleted.
Please put anything that is worth hearing, here, where we can all see it, and is is publicly archived and searchable. Alternately, in case like the Jar, it would probably be better to get a free site at 'Geocities' or whatever, and upload it there, but give us a link.
As an aside, I like your real name much more than the nickname you use, for posting to usenet. I encourage all people to use real names when posting to usenet.
Andrew T.
NeoGeoSNK - 26 Mar 2007 03:17 GMT > "NeoGeoSNK" <ny1...@gmail.com> wrote: > >> I just uesed the SAX to rewrite the code, and the performance [quoted text clipped - 17 lines] > > -- Lew Thanks Lew I pasted my source code below,maybe you can point out some problems of my DOM implementation when you free:) //The Set parsing(String filename) is implemented by DOM //The Set parsing(String filename, boolean sax) is implemented by SAX
import java.io.*; import org.w3c.dom.*; import javax.xml.parsers.*; import org.xml.sax.*; import java.util.*; import javax.xml.xpath.*; import org.xml.sax.helpers.*;
/** * parsing a XML format log file and retrieval all subscribers info. * @author yning * */
class SAXhandler extends DefaultHandler{ public SAXhandler(Set subscribers){ this.subscribers = subscribers; }
int ing; int ed; boolean inasub = false; boolean callingflag = false; boolean calledflag = false; boolean lrnflag = false; boolean dirflag = false; Set subscribers; SubInfo subscriber; public void startElement(String namespaceURL, String lname, String qname, Attributes attr){
if(qname.equals("string")){ //System.out.println("Sax parser = " + qname); //System.out.println("attr = " + attr.getValue(0)); String value = attr.getValue(0); if(value.equals("Sub_OAM_DirNumber")){ subscriber = new SubInfo(); dirflag = true; }else{ if(value.equals("create")){ subscriber.setModifier("create"); }else{ if(value.equals("modify")){ subscriber.setModifier("modify"); }else{ if(value.equals("delete")){ subscriber.setModifier("delete"); }else{ if(value.trim().matches("dirNumberId.*")){ //System.out.println("dirNumberId = " + value); String dirnumber = value.substring(value.indexOf("dirNumberId=") + 12, value.indexOf(",sHLRSubsOrganizationId")); String ndc = value.substring(value.indexOf("nDCId=") + 6, value.indexOf(",managedElementId=SHLR")); // System.out.println("dirnumber=" + dirnumber + ndc); subscriber.setNDCId(ndc); subscriber.setdirNumberId(dirnumber); }else{ if(value.equals("callingList")){ callingflag = true; }else{ if(callingflag == true){ if(value.equals("NULL")) subscriber.removeCallingList(); else subscriber.addCallingList(value); // System.out.println("callingService = " + value.trim()); //System.out.println("ing = " + ing++); callingflag = false; }else{ if(value.equals("calledList")){ calledflag = true; }else{ if(calledflag == true){ if(value.equals("NULL")) subscriber.removeCalledList(); else subscriber.addCalledList(value); // System.out.println("calledService = " + value.trim()); // System.out.println("ed = " + ed++); calledflag = false; }else{ if(value.equals("lRNumberId")){ lrnflag = true; }else{ if(lrnflag == true){ // System.out.println("lrnnumber = " + value); subscriber.setlrnNumberId(value); lrnflag = false; } } } } } } }
} } } }
} }
public void endElement(String uri, String lname, String qname){ if(qname.equals("record") && dirflag == true){ subscribers.add(subscriber); dirflag = false; } }
}
public class ParsingLog {
public Set parsing(String filename, boolean sax)throws Exception{ Set subset = new LinkedHashSet(); File f = new File(filename); SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser paser = factory.newSAXParser(); SAXhandler handler = new SAXhandler(subset); paser.parse(f, handler); return handler.subscribers; }
public Set parsing(String filename) throws Exception{ Set subset = new LinkedHashSet(); File f = new File(filename); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(f); Element root = doc.getDocumentElement(); XPathFactory xpfactory = XPathFactory.newInstance(); XPath path = xpfactory.newXPath(); NodeList recoredlist = (NodeList)path.evaluate("/journal/record", doc, XPathConstants.NODESET); // System.out.println("frameIdlist.getLength()= " + recoredlist.getLength()); //enumerate all record in a log for(int i = 0; i < recoredlist.getLength(); i ++){ // System.out.println("recoredlist = " + recoredlist.item(i)); Node record = recoredlist.item(i); Element recordelement = (Element)record; //System.out.println(recordelement.getTagName()); //get operat type String BEtype = (String)path.evaluate("header/header_generic/domain/ @value", recordelement); // System.out.println("operation type = " + BEtype); if(!BEtype.equals("SHLR::Subscription")) continue; SubInfo subscriber = new SubInfo(); NodeList framelist = (NodeList)path.evaluate("body/frame", recordelement, XPathConstants.NODESET); // System.out.println("framelist = " + framelist.getLength()); //enumerate frame list in a record for(int j = 0; j < framelist.getLength(); j++){ // System.out.println("frame = " + framelist.item(j)); NodeList attriblist = (NodeList)path.evaluate("attribute/ attribute_value/string/@value", framelist.item(j), XPathConstants.NODESET); for(int k = 0; k < attriblist.getLength(); k++){ //System.out.println(attriblist.item(k)); //System.out.println(attriblist.item(k).getClass()); Node attribute = attriblist.item(k); String value = attribute.getNodeValue(); //String value = att.getAttribute("Value"); // System.out.println("Value = " + value); if(value.equals("create")){ subscriber.setModifier("create"); }else{ if(value.equals("modify")){ subscriber.setModifier("modify"); }else{ if(value.equals("delete")){ subscriber.setModifier("delete"); }else{ if(value.trim().matches("dirNumberId.*")){ //System.out.println("dirNumberId = " + value); String dirnumber = value.substring(value.indexOf("dirNumberId=") + 12, value.indexOf(",sHLRSubsOrganizationId")); String ndc = value.substring(value.indexOf("nDCId=") + 6, value.indexOf(",managedElementId=SHLR")); // System.out.println("dirnumber=" + dirnumber + ndc); subscriber.setNDCId(ndc); subscriber.setdirNumberId(dirnumber); }else{ if(value.equals("calledList")){ Node calledattr = attriblist.item(k + 1); String calledvalue = calledattr.getNodeValue(); // System.out.println("calledList = " + calledvalue); if(calledvalue.equals("NULL")) subscriber.removeCalledList(); else subscriber.addCalledList(calledvalue); }else{ if(value.equals("callingList")){ Node callingattr = attriblist.item(k + 1); String callingvalue = callingattr.getNodeValue(); // System.out.println("callingList = " + callingvalue); if(callingvalue.equals("NULL")) subscriber.removeCallingList(); else subscriber.addCallingList(callingvalue); }else{ if(value.equals("lRNumberId")){ Node lrnattr = attriblist.item(k + 1); String lrnvalue = lrnattr.getNodeValue(); subscriber.setlrnNumberId(lrnvalue);
} } }
} } } } } } if(subscriber != null) subset.add(subscriber); }
return subset; }
public static void main(String[] args)throws Exception{ System.out.println("start job:" + new Date());
ParsingLog a = new ParsingLog(); Set set = a.parsing("log_R2.2.xml"); System.out.println("\n\n\ntotal subscribers = " + set.size()); Iterator iterator = set.iterator(); SubInfo sub; while(iterator.hasNext()){ System.out.println("subscriber to write"); sub = (SubInfo)iterator.next(); System.out.println("dirnumber:" + sub.getdirNumberId()); System.out.println("Modifier:" + sub.getModifier()); System.out.println("ndc:" + sub.getNDCId()); System.out.println("called list:" + sub.getCalledList()); System.out.println("calling list:" + sub.getCallingList()); System.out.println("lrn:" + sub.getlrnNumberId()); } System.out.println("job finished:" + new Date());
/* Set saxset; SubInfo sub; ParsingLog b = new ParsingLog(); saxset = b.parsing("log_R2.2.xml", true); System.out.println("set size = " + saxset.size()); Iterator iterator = saxset.iterator(); while(iterator.hasNext()){ System.out.println("subscriber to write"); sub = (SubInfo)iterator.next(); System.out.println("dirnumber:" + sub.getdirNumberId()); System.out.println("Modifier:" + sub.getModifier()); System.out.println("ndc:" + sub.getNDCId()); System.out.println("called list:" + sub.getCalledList()); System.out.println("calling list:" + sub.getCallingList()); System.out.println("lrn:" + sub.getlrnNumberId()); } */ System.out.println("job finished:" + new Date()); //saxset = b.parsing("log_R2.2.xml",true); //System.out.println("set size = " + saxset.size()); } }
Jaakko Kangasharju - 26 Mar 2007 06:53 GMT >> I just uesed the SAX to rewrite the code, and the performance >> increased a lot,To my surprise, the DOM parsing the XML will consume [quoted text clipped - 5 lines] > his statement of '2 hour -> 1 to 2 seconds' was > unrealistic!). It's not at all unrealistic, an XML document of the size the OP has *should* take only a few seconds to parse. It's not that SAX is extremely fast, it's that the DOM code was clearly thrashing and therefore slow. With enough memory, DOM should take only a couple of times longer than SAX.
 Signature Jaakko Kangasharju, Helsinki Institute for Information Technology You don't have to be crazy to work here...and it doesn't help either
Patricia Shanahan - 24 Mar 2007 14:47 GMT ...
> I don't know how DOM works when it parsing a XML, I use DOM that is > because the XPath can quciky location some particular elements. I > think if the SAX only reports events but not store the whole structure > of XML like DOM does, It must be more efficient. What does "page- > thrashing" means ? ...
Imagine working in an office, doing some complicated task, using a desk with a limited area, and a file cabinet with far more paper in it than can fit on the desk.
The desk top is usually full, so when you need to create a new document or get something from the filing cabinet, you need to remove something from the desk. The easiest way is to just get rid of a paper you have not looked at recently.
There are two very different cases:
1. The pages you need more often than once every few minutes all fit on the desk. You spend most of your time working, but sometimes have to get another paper from the file cabinet.
2. The task you are doing needs far more papers than can fit on the desk. Every time you need to follow up a reference, it points to a page that is in the filing cabinet, and you cannot make progress until you get it. But to put it on the desk, you have to remove something else, and a few minutes later you need the page that you just removed...
The second condition is page thrashing.
desk top <-> computer's main memory file cabinet <-> swap file page of paper <-> virtual storage page
There are two cases when building the whole document in memory:
1. It fits. In that case there will be a heap size that is both big enough to hold the document (no out of memory errors) and small enough to fit on the desk (no page thrashing, the computer spends most of its time doing useful work, not shuffling pages between disk and memory). The obvious heap size to try is a bit smaller than the computer's physical memory. If any size works, that one will.
2. It does not fit. Any memory size big enough to avoid OutOfMemoryError is big enough to cause page thrashing.
Patricia
NeoGeoSNK - 26 Mar 2007 03:11 GMT > ...> I don't know how DOM works when it parsing a XML, I use DOM that is > > because the XPath can quciky location some particular elements. I [quoted text clipped - 44 lines] > > Patricia Thanks Patricia Your explain is very clear, Because of my poor English I can't understand your example very well,Mybe it will take several days before I understand is completely :)
Ny
Andreas Leitgeb - 27 Mar 2007 08:15 GMT > I can't wait any more time, the job is take nearly 2 hours but haven't > finished yet.I think I'll try the SAX api, is there more fast api to > parsing XML in java? Out of curiosity: You wrote that you're using a self-written xml-parser... any chance that you accidentally created an endless loop?
You should add progress indicators, by inserting System.out.println("..."), Even if this doesn't make the code faster, it might give you an indication on what really goes on(or wrong). (perhaps, after 2 hours it is still busy processing the first sub-item of the input)
Lew - 27 Mar 2007 14:32 GMT > You should add progress indicators, by inserting > System.out.println("..."), logger.debug ( "..." );
-- Lew
Jaakko Kangasharju - 23 Mar 2007 09:24 GMT > Thanks very much, > I just use the java -Xmx1024m option to allocated 1GB memory to JVM, > but 40 minutes from now, it haven't work out the XML file :'( Do you actually have 1 GB of memory on your computer? DOM parsing isn't actually very much slower than SAX, and for an XML file of the size you described, parsing should be measurable in seconds on a reasonably modern computer. So the only reason I can think of for it to be as slow as it is is that you don't have enough physical memory and the JVM starts paging.
I would try lowering the -Xmx option to less than the actual memory you have and try to find a value that lets you parse the file without paging to disk. It's hard to say the exact value, but your XML file seems pretty heavy on the structure, so a DOM representation is going to take a lot of memory. I have here a 2 MB XML file about as heavily structured, and it takes about 20 MB as a DOM tree, so you can perhaps estimate from that.
 Signature Jaakko Kangasharju, Helsinki Institute for Information Technology begin 644 wittysig.txt K5&AI<R!S:6=N871U<F4@8V]N=&%I;G,@;F\@=VET='D@<75O=&%T:6]N"@`` ` end
Tom Hawtin - 23 Mar 2007 09:39 GMT > I would try lowering the -Xmx option to less than the actual memory > you have and try to find a value that lets you parse the file without > paging to disk. It's hard to say the exact value, but your XML file It's also worth setting -Xms to the same value as -Xmx. There is no point in doing lots of garbage collection if you could just allocate some more memory.
Also -server might speed things up a bit. And if in validating mode, DocumentBuilder.setIgnoringElementContentWhitespace might reduce memory a bit.
Tom Hawtin
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|