Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / March 2007

Tip: Looking for answers? Try searching our database.

Get "java.lang.OutOfMemoryError" when Parsing an XML useing DOM

Thread view: 
NeoGeoSNK - 23 Mar 2007 06:12 GMT
Hello,
I just write a XML parsing tool use java Dom parser, It works fine
when parsing small XML files, but when I parsing a over 500000 lines
XML file, it throws an "java.lang.OutOfMemoryError" Exception at line
4.

1:    File f = new File(filename);
2:    DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
3:    DocumentBuilder builder = factory.newDocumentBuilder();
4:    Document doc = builder.parse(f);

However, I don't want to use other XML parsers such as "SAX" because I
must rewrite most of my codes :'(  Below is the Syntax of the XML
file:

<journal>
  <record type="1" id="275">
     <header>
        <header_generic>

        </header_generic>
        <header_specific_user>
        </header_specific_user>
     </header>
     <body>
        <frame frame_id="200011">
           <attribute type="STRING">
              ...........

           </attribute>
        </frame>
              ............

      </body>
  </record>
</journal>

Is there somebody give me some suggestions?

Thanks and Best Regards!
Andrew Thompson - 23 Mar 2007 06:20 GMT
> Hello,
> I just write a XML parsing tool use java Dom parser, It works fine
> when parsing small XML files, but when I parsing a over 500000 lines
> XML file, it throws an "java.lang.OutOfMemoryError" Exception ..

Note that as the quoted text clearly states,
this is an *Error*, not an *Exception*.  This
is an important distinction if attempting to
catch the result.

Have you tried increasing the memory available
to the application?

Andrew T.
NeoGeoSNK - 23 Mar 2007 07:57 GMT
> > Hello,
> > I just write a XML parsing tool use java Dom parser, It works fine
[quoted text clipped - 10 lines]
>
> Andrew T.

Thanks very much,
I just increaseing the memory availble to 1Gb(java -Xmx1024m)
But It haven't finished the work from now, do you know how to
calculate the time and memory consumed?
NeoGeoSNK - 23 Mar 2007 08:09 GMT
> > Hello,
> > I just write a XML parsing tool use java Dom parser, It works fine
[quoted text clipped - 5 lines]
> is an important distinction if attempting to
> catch the result.

Thanks,
I remember I have heard before that Exception is the only error handle
mechanism of Java?
and  the error log on another PC list below is different from mine:

   Exception in thread "main" java.lang.OutOfMemoryError: Java heap
space
       at java.util.Arrays.copyOfRange(Unknown Source)
       at java.lang.String.<init>(Unknown Source)
       at
com.sun.org.apache.xerces.internal.xni.XMLString.toString(Unknown
Source)
       at
com.sun.org.apache.xerces.internal.parsers.AbstractDOMParser.characters(Unknown
Source)
       at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)

       at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
       at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown
Source)
       at
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown
Source)
       at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
Source)
       at
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
Source)
       at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
       at ParsingLog.parsing(ParsingLog.java:21)
       at Log2XML.main(Log2XML.java:12)

BRs
Ning Yu.
John W. Kennedy - 23 Mar 2007 19:56 GMT
>>> Hello,
>>> I just write a XML parsing tool use java Dom parser, It works fine
[quoted text clipped - 9 lines]
> mechanism of Java?
> and  the error log on another PC list below is different from mine:

Unfortunately, when speaking of Java, the word "exception" is used in
more than one way. A thing that can be thrown and caught is often called
an "exception", but the correct name is "Throwable". Throwables are
divided into two groups, the "Error" group and the "Exception" group.
The difference is that an Error normally represents a disaster, such as
running out of memory, that a program should not normally try to (or be
able to) recover from.

Therefore, your "it throws an 'java.lang.OutOfMemoryError' Exception" is
a whuzzat, like saying "a Pennsylvania Canadian".

Signature

John W. Kennedy
A proud member of the reality-based community.
* TagZilla 0.066 * http://tagzilla.mozdev.org

NeoGeoSNK - 24 Mar 2007 04:58 GMT
> >>> Hello,
> >>> I just write a XML parsing tool use java Dom parser, It works fine
[quoted text clipped - 27 lines]
>
> - Show quoted text -

I think the "Exception" your mean is the Excpetion class which extends
the java.lang.Throwable, but here I talk about is the Java error
handle mechanism ,so I think Exception is an excepiton, error is an
exception, and throwable is an exception too.
by the way,
"Exception in thread "main" java.lang.OutOfMemoryError: Java heap
space" is reported by the JVM, not I said:)

-- Ny
Mike Schilling - 23 Mar 2007 06:52 GMT
> Hello,
> I just write a XML parsing tool use java Dom parser, It works fine
[quoted text clipped - 10 lines]
> However, I don't want to use other XML parsers such as "SAX" because I
> must rewrite most of my codes :

DOM creates an object for each feature (element, attribute, text, etc.) of
the XML document.  A bigger document occupies more memory.  If you're going
to construct DOMs for huge documents, you'll need to give the JVM more
memory.

If you don't need to keep the entire document in memory (say, if you process
each element and cease to need it after it's processed), then SAX or a pull
parser would be far better choices.
NeoGeoSNK - 23 Mar 2007 07:53 GMT
On Mar 23, 1:52 pm, "Mike Schilling" <mscottschill...@hotmail.com>
wrote:
> > Hello,
> > I just write a XML parsing tool use java Dom parser, It works fine
[quoted text clipped - 19 lines]
> each element and cease to need it after it's processed), then SAX or a pull
> parser would be far better choices.

Thanks very much,
I just use the java -Xmx1024m option to allocated 1GB memory to JVM,
but 40 minutes from now, it haven't work out the XML file :'(
Andrew Thompson - 23 Mar 2007 08:23 GMT
> On Mar 23, 1:52 pm, "Mike Schilling" <mscottschill...@hotmail.com>
..
> > ..SAX or a pull
> > parser would be far better choices.

*
..
> I just use the java -Xmx1024m option to allocated 1GB memory to JVM,
> but 40 minutes from now, it haven't work out the XML file

Another 20 minutes and it becomes an
'incomputable problem' according to
the definition as I vaguely recall..

* Sounds as though the task might be better
achieved using the optimal tools for the
job, rather than try to 'work around' the
problems of parsing the entire document
using DOM.

Andrew T.
NeoGeoSNK - 23 Mar 2007 09:00 GMT
> > On Mar 23, 1:52 pm, "Mike Schilling" <mscottschill...@hotmail.com>
> ..
[quoted text clipped - 18 lines]
>
> Andrew T.

Thanks,
I can't wait any more time, the job is take nearly 2 hours but haven't
finished yet.I think I'll try the SAX api, is there more fast api to
parsing XML in java?
John W. Kennedy - 23 Mar 2007 20:00 GMT
>>> On Mar 23, 1:52 pm, "Mike Schilling" <mscottschill...@hotmail.com>
>> ..
[quoted text clipped - 21 lines]
> finished yet.I think I'll try the SAX api, is there more fast api to
> parsing XML in java?

SAX won't necessarily be /faster/ -- it could be a lot slower. It
depends on what you're doing.

Are you page-thrashing? If so, than SAX is definitely a good idea.
Signature

John W. Kennedy
"...if you had to fall in love with someone who was evil, I can see why
it was her."
  -- "Alias"
* TagZilla 0.066 * http://tagzilla.mozdev.org

Lew - 23 Mar 2007 22:43 GMT
> SAX won't necessarily be /faster/ [than DOM] -- it could be a lot slower. It
> depends on what you're doing.
>
> Are you page-thrashing? If so, than SAX is definitely a good idea.

Another way SAX can really speed things up is that you can use it to handle an
entire XML document in a single pass without huge memory structures to build
and traverse. Back when Java 1.2 first came out I was on a project that used
Java and SAX to parse largish XML documents over the network and it ran like a
bat out of heck. With modern network tech (gigabit LAN, ...), today's
processors and the improvements in Java it would truly scream.

It sounds like the OP's DOM tree is too large to process efficiently. SAX,
correctly used, would almost certainly create a huge speed improvement - like
from 2 hours-infinite down to about a second or two, I would guess.

Like JWK said, it really depends on what you do with the parsed data.
Additional I/O (writing the parsed data to a file or DBMS), large auxiliary
memory structures and other factors could kill the speedup.

-- Lew
NeoGeoSNK - 24 Mar 2007 04:21 GMT
> > SAX won't necessarily be /faster/ [than DOM] -- it could be a lot slower. It
> > depends on what you're doing.
[quoted text clipped - 17 lines]
>
> -- Lew

Thanks,
I just want to transfer the original XML to another XML, the original
is a log of subscribers, I ectract and return a set of these
subscribers and build a new sturcture XML.
NeoGeoSNK - 24 Mar 2007 04:05 GMT
> >>> On Mar 23, 1:52 pm, "Mike Schilling" <mscottschill...@hotmail.com>
> >> ..
[quoted text clipped - 34 lines]
>
> - Show quoted text -

what "...if you had to fall in love with someone who was evil, I can
see why
it was her."  means?
I don't know how DOM works when it parsing a XML, I use DOM that is
because the XPath can quciky location some particular elements. I
think if the SAX only reports events but not store the whole structure
of XML like DOM does, It must be more efficient. What does "page-
thrashing" means ?
I paste the source of the code:)
FYI

public Set parsing(String filename) throws Exception{
    Set subset = new LinkedHashSet();
    File f = new File(filename);
    DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = builder.parse(f);
   Element root = doc.getDocumentElement();
   XPathFactory xpfactory = XPathFactory.newInstance();
   XPath path = xpfactory.newXPath();
   NodeList recoredlist = (NodeList)path.evaluate("/journal/record",
doc, XPathConstants.NODESET);
  // System.out.println("frameIdlist.getLength()= " +
recoredlist.getLength());
   //enumerate all record in a log
    for(int i = 0; i < recoredlist.getLength(); i ++){
    //    System.out.println("recoredlist = " + recoredlist.item(i));
        Node record = recoredlist.item(i);
        Element recordelement = (Element)record;
        //System.out.println(recordelement.getTagName());
        //get operat type
        String BEtype = (String)path.evaluate("header/header_generic/domain/
@value", recordelement);
    //    System.out.println("operation type = " + BEtype);
        if(!BEtype.equals("SHLR::Subscription"))
            continue;
        SubInfo subscriber = new SubInfo();
        NodeList framelist = (NodeList)path.evaluate("body/frame",
recordelement, XPathConstants.NODESET);
     //  System.out.println("framelist = " + framelist.getLength());
       //enumerate frame list in a record
        for(int j = 0; j < framelist.getLength(); j++){
      // System.out.println("frame = " + framelist.item(j));
       NodeList attriblist = (NodeList)path.evaluate("attribute/
attribute_value/string/@value", framelist.item(j),
XPathConstants.NODESET);
           for(int k = 0; k < attriblist.getLength(); k++){
               //System.out.println(attriblist.item(k));
               //System.out.println(attriblist.item(k).getClass());
               Node attribute = attriblist.item(k);
               String value = attribute.getNodeValue();
               //String value = att.getAttribute("Value");
             //  System.out.println("Value = " + value);
               if(value.equals("create")){
                  subscriber.setModifier("create");
               }else{
                 if(value.equals("modify")){
                        subscriber.setModifier("modify");
                 }else{
                   if(value.equals("delete")){
                          subscriber.setModifier("delete");
                   }else{
                     if(value.trim().matches("dirNumberId.*")){
                             //System.out.println("dirNumberId = " +
value);
                               String dirnumber =
value.substring(value.indexOf("dirNumberId=") + 12,
value.indexOf(",sHLRSubsOrganizationId"));
                               String ndc =
value.substring(value.indexOf("nDCId=") + 6,
value.indexOf(",managedElementId=SHLR"));
                            //   System.out.println("dirnumber=" +
dirnumber + ndc);
                               subscriber.setNDCId(ndc);
   
subscriber.setdirNumberId(dirnumber);
                     }else{
                       if(value.equals("calledList")){
                          Node calledattr = attriblist.item(k + 1);
                           String calledvalue =
calledattr.getNodeValue();
                         //  System.out.println("calledList = " +
calledvalue);
                           if(calledvalue.equals("NULL"))
                             subscriber.removeCalledList();
                           else
                             subscriber.addCalledList(calledvalue);
                       }else{
                        if(value.equals("callingList")){
                              Node callingattr = attriblist.item(k + 1);
                               String callingvalue =
callingattr.getNodeValue();
                            //   System.out.println("callingList = " +
callingvalue);
                               if(callingvalue.equals("NULL"))
                                 subscriber.removeCallingList();
                               else
                                 subscriber.addCallingList(callingvalue);
                        }else{
                           if(value.equals("lRNumberId")){
                               Node lrnattr = attriblist.item(k + 1);
                               String lrnvalue = lrnattr.getNodeValue();
                               subscriber.setlrnNumberId(lrnvalue);

                           }
                        }
                       }

                     }
                   }
                }
               }
           }
        }
        if(subscriber != null)
            subset.add(subscriber);
    }

    return subset;
}
Andrew Thompson - 24 Mar 2007 04:44 GMT
..
> > --
> > John W. Kennedy
> > "...if you had to fall in love with someone who was evil, I can see why
> > it was her."
> >    -- "Alias"
...
> what "...if you had to fall in love with someone who was evil, I can
> see why
> it was her."  means?

Note that it was not connected to the technical
part of the conversation, it is just part of a
'sig.' or 'signature line'.  Sig.s are often
intended to be humorous, or funny, and that is
just one such line.  Other people's sig.'s might
push points of view that the person is particularly
fond of, or to simply add details of themselves,
or their own web sites, or links of interest to
them.

I generally prefer the 'funny' sig.s - most
other sig.s take themselves far too seriously.

(Note also that it is generally a good idea
to trim sig.s when replying, as the relevant
information ('who posted what') is still contained
in the 'Jim wrote: ..' attribution lines above the
text.)

Andrew T.
NeoGeoSNK - 24 Mar 2007 13:41 GMT
> ..
> > > --
[quoted text clipped - 27 lines]
>
> Andrew T.

Thanks Andrew T
I just uesed the SAX to rewrite the code, and the performance
increased a lot,To my surprise, the DOM parsing the XML will consume
more than 6 hours, but the SAX take 6 seconds only:), I think the DOM
can't paring a XML file which more than 100000 lines without throw an
memory exception,I think there would be no argument about the speed of
these two parsers. When use DOM, I must load the whole XML in to
memory,
                      Document doc = builder.parse(file);
this will become impossible when the file is too large.

I realy can't understand about the 'signature line' you explained? I
think it's more complex than the XML parser and Java:)
I guess the ".)" is a 'funny' sig.s of you ?
Andrew Thompson - 24 Mar 2007 14:09 GMT
(big trim)

> Thanks Andrew T

Well, ..for what ever I've done 'your welcome',
but most of the best suggestions in this thread
came from other people!  AFAIR it was Mike S.
that first suggested the much better strategy
of using SAX.

> I just uesed the SAX to rewrite the code, and the performance
> increased a lot,To my surprise, the DOM parsing the XML will consume
> more than 6 hours, but the SAX take 6 seconds only:),

Hmm...  That is quite an impressive difference,
isn't it?  Lew's estimate was not far off (I did
not comment at the time - but I really thought
his statement of '2 hour -> 1 to 2 seconds' was
unrealistic!).

> I realy can't understand about the 'signature line' you explained? I
> think it's more complex than the XML parser and Java:)

It is both more complicated, and far less
important, but I do not quite understand
what you mean - if you need further information,
please write your question a little differently
(I do not understand your *question*).

On the other hand, I recommend forgetting
the sig. - it is really not that important.

By the way - I am glad you solved the
technical problem.   :-)

Andrew T.
Lew - 24 Mar 2007 16:31 GMT
"NeoGeoSNK" <ny1...@gmail.com> wrote:
>> I just uesed the SAX to rewrite the code, and the performance
>> increased a lot,To my surprise, the DOM parsing the XML will consume
>> more than 6 hours, but the SAX take 6 seconds only:),

> Hmm...  That is quite an impressive difference,
> isn't it?  Lew's estimate was not far off (I did
> not comment at the time - but I really thought
> his statement of '2 hour -> 1 to 2 seconds' was
> unrealistic!).

Oh, ye of little faith! :-)

It would've been fine with me if I were wrong - I have been proven wrong in
this forum several times before. I just know how fast a good SAX
implementation can be, went out on a limb and was right this time.

I wonder if there weren't a particular problem with the DOM implementation,
though. Others in this thread have had better success with a DOM approach than
the OP did.

-- Lew
Tom Hawtin - 24 Mar 2007 17:56 GMT
> I wonder if there weren't a particular problem with the DOM
> implementation, though. Others in this thread have had better success
> with a DOM approach than the OP did.

Possibly something to do with the form of the XML being used. IIRC,
there is something about handling of attributes that can make DOM very
slow. It's also going to be somewhat implementation dependent.

Tom Hawtin
Andrew Thompson - 25 Mar 2007 03:51 GMT
> "NeoGeoSNK" <ny1...@gmail.com> wrote:
> >> I just uesed the SAX to rewrite the code, and the performance
[quoted text clipped - 6 lines]
>
> Oh, ye of little faith! :-)

Damn faith!  Give me run-time results, anyday!   ;-)
(If you had stated it as 'code I worked on,
improved ...' I would have been prepared to
accept it at face value..)

Andrew T.
NeoGeoSNK - 26 Mar 2007 03:35 GMT
> > "NeoGeoSNK" <ny1...@gmail.com> wrote:
> > >> I just uesed the SAX to rewrite the code, and the performance
[quoted text clipped - 13 lines]
>
> Andrew T.

Hello Andrew T
I just send my tool including the Log files "log_R2.2.xml" in a jar to
you, please check your mailbox.)

Ny
Andrew Thompson - 26 Mar 2007 09:51 GMT
...
> > Damn faith!  Give me run-time results, anyday!   ;-)
...
> Hello Andrew T
> I just send my tool including the Log files "log_R2.2.xml" in a jar to
> you, please check your mailbox.)

Thanks.  But in fact, although my comment above
seemed to invite you to do that, I do not actually
need folks from usenet to send me code.  More
specifically, unless email from usenet includes
the word 'consultancy', it automatically gets deleted.

Please put anything that is worth hearing, here,
where we can all see it, and is is publicly archived
and searchable.  Alternately, in case like the
Jar, it would probably be better to get a free site
at 'Geocities' or whatever, and upload it there,
but give us a link.

As an aside, I like your real name much more
than the nickname you use, for posting to
usenet.  I encourage all people to use real
names when posting to usenet.

Andrew T.
NeoGeoSNK - 26 Mar 2007 03:17 GMT
> "NeoGeoSNK" <ny1...@gmail.com> wrote:
> >> I just uesed the SAX to rewrite the code, and the performance
[quoted text clipped - 17 lines]
>
> -- Lew

Thanks Lew
I pasted my source code below,maybe you can point out some problems of
my DOM implementation when you free:)
//The Set parsing(String filename) is implemented by DOM
//The Set parsing(String filename, boolean sax) is implemented by SAX

import java.io.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
import java.util.*;
import javax.xml.xpath.*;
import org.xml.sax.helpers.*;

/**
* parsing a XML format log file and retrieval all subscribers info.
* @author yning
*
*/

class SAXhandler extends DefaultHandler{
    public SAXhandler(Set subscribers){
        this.subscribers = subscribers;
    }

   int ing;
   int ed;
   boolean inasub = false;
   boolean callingflag = false;
   boolean calledflag = false;
   boolean lrnflag = false;
   boolean dirflag = false;
   Set subscribers;
   SubInfo subscriber;
    public void startElement(String namespaceURL, String lname, String
qname, Attributes attr){

        if(qname.equals("string")){
            //System.out.println("Sax parser = " + qname);
            //System.out.println("attr = " + attr.getValue(0));
            String value = attr.getValue(0);
            if(value.equals("Sub_OAM_DirNumber")){
                   subscriber = new SubInfo();
                   dirflag = true;
            }else{
                if(value.equals("create")){
                    subscriber.setModifier("create");
                }else{
                    if(value.equals("modify")){
                        subscriber.setModifier("modify");
                    }else{
                        if(value.equals("delete")){
                            subscriber.setModifier("delete");
                        }else{
                             if(value.trim().matches("dirNumberId.*")){
                                 //System.out.println("dirNumberId = " +
value);
                                   String dirnumber =
value.substring(value.indexOf("dirNumberId=") + 12,
value.indexOf(",sHLRSubsOrganizationId"));
                                   String ndc =
value.substring(value.indexOf("nDCId=") + 6,
value.indexOf(",managedElementId=SHLR"));
                                //   System.out.println("dirnumber=" +
dirnumber + ndc);
                                   subscriber.setNDCId(ndc);
                                   subscriber.setdirNumberId(dirnumber);
                             }else{
                                if(value.equals("callingList")){
                                    callingflag = true;
                                }else{
                                    if(callingflag == true){
                                          if(value.equals("NULL"))
                                              subscriber.removeCallingList();
                                          else
                                              subscriber.addCallingList(value);
                                       // System.out.println("callingService = " +
value.trim());
                                        //System.out.println("ing = " + ing++);
                                        callingflag = false;
                                    }else{
                                        if(value.equals("calledList")){
                                            calledflag = true;
                                        }else{
                                            if(calledflag == true){
                                                  if(value.equals("NULL"))
                                                      subscriber.removeCalledList();
                                                  else
                                                      subscriber.addCalledList(value);
                                           // System.out.println("calledService = " +
value.trim());
                                           // System.out.println("ed = " + ed++);
                                            calledflag = false;
                                            }else{
                                                if(value.equals("lRNumberId")){
                                                    lrnflag = true;
                                                }else{
                                                    if(lrnflag == true){
                                                       //  System.out.println("lrnnumber = " + value);
                                                        subscriber.setlrnNumberId(value);
                                                        lrnflag = false;
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                             }

                    }
                }
            }
        }

    }
 }

    public void endElement(String uri, String lname, String qname){
        if(qname.equals("record") && dirflag == true){
            subscribers.add(subscriber);
            dirflag = false;
        }
    }

}

public class ParsingLog {

public Set parsing(String filename, boolean sax)throws Exception{
   Set subset = new LinkedHashSet();
    File f = new File(filename);
    SAXParserFactory factory = SAXParserFactory.newInstance();
    SAXParser paser = factory.newSAXParser();
   SAXhandler handler = new SAXhandler(subset);
    paser.parse(f, handler);
    return handler.subscribers;
}

public Set parsing(String filename) throws Exception{
    Set subset = new LinkedHashSet();
    File f = new File(filename);
    DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = builder.parse(f);
   Element root = doc.getDocumentElement();
   XPathFactory xpfactory = XPathFactory.newInstance();
   XPath path = xpfactory.newXPath();
   NodeList recoredlist = (NodeList)path.evaluate("/journal/record",
doc, XPathConstants.NODESET);
  // System.out.println("frameIdlist.getLength()= " +
recoredlist.getLength());
   //enumerate all record in a log
    for(int i = 0; i < recoredlist.getLength(); i ++){
    //    System.out.println("recoredlist = " + recoredlist.item(i));
        Node record = recoredlist.item(i);
        Element recordelement = (Element)record;
        //System.out.println(recordelement.getTagName());
        //get operat type
        String BEtype = (String)path.evaluate("header/header_generic/domain/
@value", recordelement);
    //    System.out.println("operation type = " + BEtype);
        if(!BEtype.equals("SHLR::Subscription"))
            continue;
        SubInfo subscriber = new SubInfo();
        NodeList framelist = (NodeList)path.evaluate("body/frame",
recordelement, XPathConstants.NODESET);
     //  System.out.println("framelist = " + framelist.getLength());
       //enumerate frame list in a record
        for(int j = 0; j < framelist.getLength(); j++){
      // System.out.println("frame = " + framelist.item(j));
       NodeList attriblist = (NodeList)path.evaluate("attribute/
attribute_value/string/@value", framelist.item(j),
XPathConstants.NODESET);
           for(int k = 0; k < attriblist.getLength(); k++){
               //System.out.println(attriblist.item(k));
               //System.out.println(attriblist.item(k).getClass());
               Node attribute = attriblist.item(k);
               String value = attribute.getNodeValue();
               //String value = att.getAttribute("Value");
             //  System.out.println("Value = " + value);
               if(value.equals("create")){
                  subscriber.setModifier("create");
               }else{
                 if(value.equals("modify")){
                        subscriber.setModifier("modify");
                 }else{
                   if(value.equals("delete")){
                          subscriber.setModifier("delete");
                   }else{
                     if(value.trim().matches("dirNumberId.*")){
                             //System.out.println("dirNumberId = " +
value);
                               String dirnumber =
value.substring(value.indexOf("dirNumberId=") + 12,
value.indexOf(",sHLRSubsOrganizationId"));
                               String ndc =
value.substring(value.indexOf("nDCId=") + 6,
value.indexOf(",managedElementId=SHLR"));
                            //   System.out.println("dirnumber=" +
dirnumber + ndc);
                               subscriber.setNDCId(ndc);
   
subscriber.setdirNumberId(dirnumber);
                     }else{
                       if(value.equals("calledList")){
                          Node calledattr = attriblist.item(k + 1);
                           String calledvalue =
calledattr.getNodeValue();
                         //  System.out.println("calledList = " +
calledvalue);
                           if(calledvalue.equals("NULL"))
                             subscriber.removeCalledList();
                           else
                             subscriber.addCalledList(calledvalue);
                       }else{
                        if(value.equals("callingList")){
                              Node callingattr = attriblist.item(k + 1);
                               String callingvalue =
callingattr.getNodeValue();
                            //   System.out.println("callingList = " +
callingvalue);
                               if(callingvalue.equals("NULL"))
                                 subscriber.removeCallingList();
                               else
                                 subscriber.addCallingList(callingvalue);
                        }else{
                           if(value.equals("lRNumberId")){
                               Node lrnattr = attriblist.item(k + 1);
                               String lrnvalue = lrnattr.getNodeValue();
                               subscriber.setlrnNumberId(lrnvalue);

                           }
                        }
                       }

                     }
                   }
                }
               }
           }
        }
        if(subscriber != null)
            subset.add(subscriber);
    }

    return subset;
}

public static void main(String[] args)throws Exception{
   System.out.println("start job:" + new Date());

    ParsingLog a = new ParsingLog();
    Set set = a.parsing("log_R2.2.xml");
    System.out.println("\n\n\ntotal subscribers = " + set.size());
    Iterator iterator = set.iterator();
    SubInfo sub;
    while(iterator.hasNext()){
        System.out.println("subscriber to write");
       sub = (SubInfo)iterator.next();
        System.out.println("dirnumber:" + sub.getdirNumberId());
        System.out.println("Modifier:" + sub.getModifier());
        System.out.println("ndc:" + sub.getNDCId());
        System.out.println("called list:" + sub.getCalledList());
        System.out.println("calling list:" + sub.getCallingList());
        System.out.println("lrn:" + sub.getlrnNumberId());
    }
    System.out.println("job finished:" + new Date());

    /*
    Set saxset;
    SubInfo sub;
    ParsingLog b = new ParsingLog();
    saxset = b.parsing("log_R2.2.xml", true);
    System.out.println("set size = " + saxset.size());
    Iterator iterator = saxset.iterator();
    while(iterator.hasNext()){
        System.out.println("subscriber to write");
       sub = (SubInfo)iterator.next();
        System.out.println("dirnumber:" + sub.getdirNumberId());
        System.out.println("Modifier:" + sub.getModifier());
        System.out.println("ndc:" + sub.getNDCId());
        System.out.println("called list:" + sub.getCalledList());
        System.out.println("calling list:" + sub.getCallingList());
        System.out.println("lrn:" + sub.getlrnNumberId());
    }
    */
    System.out.println("job finished:" + new Date());
    //saxset = b.parsing("log_R2.2.xml",true);
    //System.out.println("set size = " + saxset.size());
}
}
Jaakko Kangasharju - 26 Mar 2007 06:53 GMT
>> I just uesed the SAX to rewrite the code, and the performance
>> increased a lot,To my surprise, the DOM parsing the XML will consume
[quoted text clipped - 5 lines]
> his statement of '2 hour -> 1 to 2 seconds' was
> unrealistic!).

It's not at all unrealistic, an XML document of the size the OP has
*should* take only a few seconds to parse.  It's not that SAX is
extremely fast, it's that the DOM code was clearly thrashing and
therefore slow.  With enough memory, DOM should take only a couple of
times longer than SAX.

Signature

Jaakko Kangasharju, Helsinki Institute for Information Technology
You don't have to be crazy to work here...and it doesn't help either

Patricia Shanahan - 24 Mar 2007 14:47 GMT
...
> I don't know how DOM works when it parsing a XML, I use DOM that is
> because the XPath can quciky location some particular elements. I
> think if the SAX only reports events but not store the whole structure
> of XML like DOM does, It must be more efficient. What does "page-
> thrashing" means ?
...

Imagine working in an office, doing some complicated task, using a desk
with a limited area, and a file cabinet with far more paper in it than
can fit on the desk.

The desk top is usually full, so when you need to create a new document
or get something from the filing cabinet, you need to remove something
from the desk. The easiest way is to just get rid of a paper you have
not looked at recently.

There are two very different cases:

1. The pages you need more often than once every few minutes all fit on
the desk. You spend most of your time working, but sometimes have to get
another paper from the file cabinet.

2. The task you are doing needs far more papers than can fit on the
desk. Every time you need to follow up a reference, it points to a page
that is in the filing cabinet, and you cannot make progress until you
get it. But to put it on the desk, you have to remove something else,
and a few minutes later you need the page that you just removed...

The second condition is page thrashing.

desk top <-> computer's main memory
file cabinet <-> swap file
page of paper <-> virtual storage page

There are two cases when building the whole document in memory:

1. It fits. In that case there will be a heap size that is both big
enough to hold the document (no out of memory errors) and small enough
to fit on the desk (no page thrashing, the computer spends most of its
time doing useful work, not shuffling pages between disk and memory).
The obvious heap size to try is a bit smaller than the computer's
physical memory. If any size works, that one will.

2. It does not fit. Any memory size big enough to avoid OutOfMemoryError
is big enough to cause page thrashing.

Patricia
NeoGeoSNK - 26 Mar 2007 03:11 GMT
> ...> I don't know how DOM works when it parsing a XML, I use DOM that is
> > because the XPath can quciky location some particular elements. I
[quoted text clipped - 44 lines]
>
> Patricia

Thanks Patricia
Your explain is very clear, Because of my poor English I can't
understand your example very well,Mybe it will take several days
before I understand is completely :)

Ny
Andreas Leitgeb - 27 Mar 2007 08:15 GMT
> I can't wait any more time, the job is take nearly 2 hours but haven't
> finished yet.I think I'll try the SAX api, is there more fast api to
> parsing XML in java?

Out of curiosity: You wrote that you're using a
self-written xml-parser...  any chance that you
accidentally created an endless loop?

You should add progress indicators, by inserting
System.out.println("..."),  Even if this doesn't
make the code faster, it might give you an indication
on what really goes on(or wrong).  (perhaps, after 2
hours it is still busy processing the first sub-item
of the input)
Lew - 27 Mar 2007 14:32 GMT
> You should add progress indicators, by inserting
> System.out.println("..."),  

logger.debug ( "..." );

-- Lew
Jaakko Kangasharju - 23 Mar 2007 09:24 GMT
> Thanks very much,
> I just use the java -Xmx1024m option to allocated 1GB memory to JVM,
> but 40 minutes from now, it haven't work out the XML file :'(

Do you actually have 1 GB of memory on your computer?  DOM parsing
isn't actually very much slower than SAX, and for an XML file of the
size you described, parsing should be measurable in seconds on a
reasonably modern computer.  So the only reason I can think of for it
to be as slow as it is is that you don't have enough physical memory
and the JVM starts paging.

I would try lowering the -Xmx option to less than the actual memory
you have and try to find a value that lets you parse the file without
paging to disk.  It's hard to say the exact value, but your XML file
seems pretty heavy on the structure, so a DOM representation is going
to take a lot of memory.  I have here a 2 MB XML file about as heavily
structured, and it takes about 20 MB as a DOM tree, so you can perhaps
estimate from that.

Signature

Jaakko Kangasharju, Helsinki Institute for Information Technology
begin 644 wittysig.txt
K5&AI<R!S:6=N871U<F4@8V]N=&%I;G,@;F\@=VET='D@<75O=&%T:6]N"@``
`
end

Tom Hawtin - 23 Mar 2007 09:39 GMT
> I would try lowering the -Xmx option to less than the actual memory
> you have and try to find a value that lets you parse the file without
> paging to disk.  It's hard to say the exact value, but your XML file

It's also worth setting -Xms to the same value as -Xmx. There is no
point in doing lots of garbage collection if you could just allocate
some more memory.

Also -server might speed things up a bit. And if in validating mode,
DocumentBuilder.setIgnoringElementContentWhitespace might reduce memory
a bit.

Tom Hawtin


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.