Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / March 2008

Tip: Looking for answers? Try searching our database.

SAX succeeds, but StAX fails

Thread view: 
Kai Schlamp - 06 Mar 2008 12:57 GMT
Hy!

I tried to parse PubMed (a biomedical article database) with SAX and
also StAX. The last one failed, but I am not sure why (see Exception
below).
Why does SAX succeed and StAX don't?
The XML document seems to be fine (see
http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933&retm
ode=xml
)
Any suggestions?

Kai

StAX example:
           String address = "http://www.ncbi.nlm.nih.gov/entrez/
eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml";
           URL url = new URL(address);

           XMLInputFactory factory = XMLInputFactory.newInstance();
           XMLStreamReader parser =
factory.createXMLStreamReader(url.openConnection().getInputStream());

           while(parser.hasNext()) {
               switch(parser.getEventType()) {
               }
               parser.next();
           }

Error message:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
Message: A '(' character or an element type is required in the
declaration of element type "PubMedPubDate".

SAX example:
           SAXParserFactory parserFactory =
SAXParserFactory.newInstance();
           parserFactory.setValidating(true);
           parserFactory.setNamespaceAware(true);
           SAXParser parser = parserFactory.newSAXParser();
           parser.parse(url.openConnection().getInputStream(), new
PubmedEFetchHandler());

(PubmedEFetchHander is a simple DefaultHandler with some debugging
output).
GArlington - 06 Mar 2008 15:35 GMT
> Hy!
>
[quoted text clipped - 3 lines]
> Why does SAX succeed and StAX don't?
> The XML document seems to be fine (seehttp://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11...)

As far as I can see this request DOES NOT generate valid xml (or any
xml).

> Any suggestions?
>
[quoted text clipped - 31 lines]
> (PubmedEFetchHander is a simple DefaultHandler with some debugging
> output).
Kai Schlamp - 06 Mar 2008 16:49 GMT
Seems to be a posting converting error (I am posting through google
groups).
The link in your message doesn't contain the retmode=xml anymore.
Please try this url:
www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml
It should generate valid XML.
Kai Schlamp - 06 Mar 2008 17:01 GMT
Ok, I checked the new link again and the problem remains. When I click
the link and it opens in Firefox, it is indeed no XML.
But when you then press the "Go To" button (green button on the right
of the url input field), then the valid XML appears. I am not sure why
this happens, but it doesn't have to do something with my original
problem. Seems to be a little Firefox problem.

> Seems to be a posting converting error (I am posting through google
> groups).
> The link in your message doesn't contain the retmode=xml anymore.
> Please try this url:www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933&...
> It should generate valid XML.
GArlington - 07 Mar 2008 11:55 GMT
> Ok, I checked the new link again and the problem remains. When I click
> the link and it opens in Firefox, it is indeed no XML.
[quoted text clipped - 8 lines]
> > Please try this url:www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933&...
> > It should generate valid XML.

OK, I tried accessing it with IE and it worked first time, I thought
that I gave it a try in IE yesterday too, but...
I fetched your url and parsed it (with my own methods) and it works,
so I suspect that there is a problem with StAX...
The only thing I can suggest is: try to dump what you get from your
url BEFORE you try to parse it and then dump the data at each step
until you get to your error - this will help you to find where the
problem first shows it's ugly head...
Kai Schlamp - 12 Mar 2008 21:33 GMT
I still have the same problem with StAX. I dumped the output of the
url before parsing it, and it seems to be fine and well formed.
But parsing with StAX still gives me an exception right in the first
loop (SAX seems to work fine).
Below is a small test class. Can someone explain to me, why this
happens?
I also tried  to copy the output of the url in a file and parsing it
directly from disk ... didn't solve that problem.
Perhaps I should try it with another StAX provider. I found one on the
net named Woodstox. Are there any more? What is the default
implementation? An Apache project?

The error output of the below test class:

START_DOCUMENT: 1.0
beforeNext
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
Message: A '(' character or an element type is required in the
declaration of element type "PubMedPubDate".
    at
com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:
588)
    at StaxTester.main(StaxTester.java:49)

The test class:

import java.net.URL;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public class StaxTester {

    public static void main(String[] args) {
        try {
            String address = "http://www.ncbi.nlm.nih.gov/entrez/eutils/
efetch.fcgi?db=pubmed&retmode=xml&id=11748933";
            //String address = "http://www.ncbi.nlm.nih.gov/entrez/eutils/
esearch.fcgi?db=pmc&term=stem+cells+AND+free+fulltext[filter]";
            URL url = new URL(address);

            XMLInputFactory factory = XMLInputFactory.newInstance();
            XMLStreamReader parser =
factory.createXMLStreamReader(url.openConnection().getInputStream());

            while(parser.hasNext()) {
                switch(parser.getEventType()) {
                    case XMLStreamConstants.START_DOCUMENT:
                         System.out.println( "START_DOCUMENT: " +
parser.getVersion() );
                         break;

                   case XMLStreamConstants.END_DOCUMENT:
                     System.out.println( "END_DOCUMENT: " );
                     parser.close();
                     break;

                   case XMLStreamConstants.NAMESPACE:
                     System.out.println( "NAMESPACE: " +
parser.getNamespaceURI() );
                     break;

                   case XMLStreamConstants.START_ELEMENT:
                     System.out.println( "START_ELEMENT: " +
parser.getLocalName() );
                     break;

                   case XMLStreamConstants.CHARACTERS:
                     if ( ! parser.isWhiteSpace() )
                       System.out.println( "CHARACTERS: " + parser.getText() );
                     break;

                   case XMLStreamConstants.END_ELEMENT:
                     System.out.println("END_ELEMENT: " +
parser.getLocalName() );
                     break;

                   default:
                     break;
                }
                System.out.println("beforeNext");
                parser.next();
                System.out.println("afterNext");
            }

            /** SAX succeeds. Why that? */
//            SAXParserFactory parserFactory = SAXParserFactory.newInstance();
//            parserFactory.setValidating(true);
//            parserFactory.setNamespaceAware(true);
//            SAXParser parser = parserFactory.newSAXParser();
//            parser.parse(url.openConnection().getInputStream(), new
PubmedEFetchHandler());
//
        }
        catch (Exception e) {
            e.printStackTrace();
        }

    }

}
Owen Jacobson - 12 Mar 2008 22:27 GMT
> Hy!
>
[quoted text clipped - 4 lines]
> The XML document seems to be fine (seehttp://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11...)
> Any suggestions?

...

>             String address = "http://www.ncbi.nlm.nih.gov/entrez/
> eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml";
>             URL url = new URL(address);

...

> Error message:
> javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
> Message: A '(' character or an element type is required in the
> declaration of element type "PubMedPubDate".

The XML document itself is fine, but non-validating due to problems in
the DTD; StAX by default attempts to validate input documents. SAX is
ignoring the DTD associated with the XML document, and therefore
doesn't notice that the DTD is invalid.

-o
Kai Schlamp - 12 Mar 2008 22:43 GMT
> > Hy!
>
[quoted text clipped - 24 lines]
>
> -o

Thanks for the answer.
So disabling DTD validation should solve that problem?
I tried
factory.setProperty("javax.xml.stream.isValidating", false);
(which is the default as stated in the Javadoc), but it also didn't
solve the problem.

Another thing ... I just tried the Woodstox implementation (just added
it to the classpath), and everything works fine (even without changing
any property). So it seems, that there is a specific problem with the
reference implementation.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.