Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / September 2007

Tip: Looking for answers? Try searching our database.

Parsing XML with Dom

Thread view: 
nuthinking@googlemail.com - 28 Sep 2007 00:03 GMT
I can't believe I'm stuck on this, but
DocumentBuilderFactory.setIgnoringElementContentWhitespace doesn't
seem to work at all, I still get the new lines as text elements :S

Any idea?

Here the small code I used:

protected static void parseDom(File file)
    {
        // TODO Auto-generated method stub
        DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
        factory.setIgnoringComments(true);
        factory.setIgnoringElementContentWhitespace(true);

        DocumentBuilder parser;
        try {
            parser = factory.newDocumentBuilder();
            Document document = parser.parse(file);
            NodeList list = document.getChildNodes();
            int len = list.getLength();
            System.out.println("#parseDom: len:" + len);
            for (int i = 0; i < len; i++) {
                Node element = list.item(i);
                parseNode(element);
            }
        } catch (ParserConfigurationException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (SAXException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

    private static void parseNode(Node node)
    {
        System.out.println("#parseNode:" + node.getNodeName() + " = " +
node.getNodeValue() + " type:" + node.getNodeType());
        NamedNodeMap attributes = node.getAttributes();
        if(attributes != null){
            int len = attributes.getLength();
            for (int i = 0; i < len; i++) {
                Node attr = attributes.item(i);
                parseAttribute(attr);
            }
        }
        if(!node.hasChildNodes()) return;

        NodeList list = node.getChildNodes();
        int len = list.getLength();
        System.out.println("-- num children: " + len);
        for(int i= 0; i<len; i++) {
            Node child = list.item(i);
            parseNode(child);
        }
        System.out.println("------");
    }

    private static void parseAttribute(Node node)
    {
        // TODO Auto-generated method stub
        System.out.println("#parseAttribute:" + node.getNodeName() + " = " +
node.getNodeValue());
    }

Thanks,

chr
Andrew Thompson - 28 Sep 2007 05:33 GMT
...
>DocumentBuilderFactory.setIgnoringElementContentWhitespace doesn't
>seem to work at all, I still get the new lines as text elements :S
...
>Here the small code I used:

A highly motivated* master of the craft** might be able to spot
the mistake in your 63 line snippet by eye.  To get the help
of 'the rest of us', you are better off posting an SSCCE***.

* Highly enough motivated to try and spot mistakes by
simply reading the code, as opposed to seeing the code
work/fail when run.

** ..and they would probably need to know XML processing
inside and out, often mistakes are spotted by people who do
not know an API that well, but were simply interested enough
to run a code sample.

*** <http://www.physci.org/codes/sscce.html>
It would be best to pull a small XML directly from
URL off a web site.  If you cannot manage to upload
it to somehwere that is open to being fetched by Java,
try including a small sample in your post.

Signature

Andrew Thompson
http://www.athompson.info/andrew/

nuthinking@googlemail.com - 28 Sep 2007 08:06 GMT
The problem seemed it is that setIgnoringElementContentWhitespace
works if the xml refers to either to xsd or dtd.

Thanks anyway, chr

> >DocumentBuilderFactory.setIgnoringElementContentWhitespace doesn't
> >seem to work at all, I still get the new lines as text elements :S
[quoted text clipped - 24 lines]
>
> Message posted via JavaKB.comhttp://www.javakb.com/Uwe/Forums.aspx/java-general/200709/1
Arne Vajhøj - 30 Sep 2007 22:13 GMT
> The problem seemed it is that setIgnoringElementContentWhitespace
> works if the xml refers to either to xsd or dtd.

To some extent that I think that makes sense.

Only with a DTD or XSD is it possible to identify something
as content whitespace.

Arne
Arne Vajhøj - 30 Sep 2007 22:37 GMT
>> The problem seemed it is that setIgnoringElementContentWhitespace
>> works if the xml refers to either to xsd or dtd.
[quoted text clipped - 3 lines]
> Only with a DTD or XSD is it possible to identify something
> as content whitespace.

Try look at the attached example.

Arne

====================================

package september;

import java.io.StringReader;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.traversal.DocumentTraversal;
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.traversal.TreeWalker;
import org.xml.sax.InputSource;

public class XMLandWS {
    public static void parse(String xml) throws Exception {
        System.out.print(xml);
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setIgnoringElementContentWhitespace(true);
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document doc = db.parse(new InputSource(new StringReader(xml)));
        TreeWalker walk = ((DocumentTraversal)
doc).createTreeWalker(doc.getDocumentElement(), NodeFilter.SHOW_TEXT,
null, false);
        Node n;
        while ((n = walk.nextNode()) != null) {
            System.out.println("=" + n.getNodeValue().replace("\n",
"\\n").replace(" ", "_"));
        }
    }
    public static void main(String[] args) throws Exception {
        parse("<all>\n" +
              "  <one>A</one>\n" +
              "  <one>BB</one>\n" +
              "  <one>CCC</one>\n" +
              "</all>\n");
        parse("<!DOCTYPE all [\n" +
              "<!ELEMENT all (one)*>\n" +
              "<!ELEMENT one (#PCDATA)>\n" +
              "]>\n" +
              "<all>\n" +
              "  <one>A</one>\n" +
              "  <one>BB</one>\n" +
              "  <one>CCC</one>\n" +
              "</all>\n");
        parse("<!DOCTYPE all [\n" +
                "<!ELEMENT all (#PCDATA|one)*>\n" +
                "<!ELEMENT one (#PCDATA)>\n" +
                "]>\n" +
                "<all>\n" +
                "  <one>A</one>\n" +
                "  <one>BB</one>\n" +
                "  <one>CCC</one>\n" +
                "</all>\n");
    }
}


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.