> Hy!
>
[quoted text clipped - 3 lines]
> Why does SAX succeed and StAX don't?
> The XML document seems to be fine (seehttp://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11...)
As far as I can see this request DOES NOT generate valid xml (or any
xml).
> Any suggestions?
>
[quoted text clipped - 31 lines]
> (PubmedEFetchHander is a simple DefaultHandler with some debugging
> output).
Kai Schlamp - 06 Mar 2008 16:49 GMT
Seems to be a posting converting error (I am posting through google
groups).
The link in your message doesn't contain the retmode=xml anymore.
Please try this url:
www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml
It should generate valid XML.
Kai Schlamp - 06 Mar 2008 17:01 GMT
Ok, I checked the new link again and the problem remains. When I click
the link and it opens in Firefox, it is indeed no XML.
But when you then press the "Go To" button (green button on the right
of the url input field), then the valid XML appears. I am not sure why
this happens, but it doesn't have to do something with my original
problem. Seems to be a little Firefox problem.
> Seems to be a posting converting error (I am posting through google
> groups).
> The link in your message doesn't contain the retmode=xml anymore.
> Please try this url:www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933&...
> It should generate valid XML.
GArlington - 07 Mar 2008 11:55 GMT
> Ok, I checked the new link again and the problem remains. When I click
> the link and it opens in Firefox, it is indeed no XML.
[quoted text clipped - 8 lines]
> > Please try this url:www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933&...
> > It should generate valid XML.
OK, I tried accessing it with IE and it worked first time, I thought
that I gave it a try in IE yesterday too, but...
I fetched your url and parsed it (with my own methods) and it works,
so I suspect that there is a problem with StAX...
The only thing I can suggest is: try to dump what you get from your
url BEFORE you try to parse it and then dump the data at each step
until you get to your error - this will help you to find where the
problem first shows it's ugly head...
Kai Schlamp - 12 Mar 2008 21:33 GMT
I still have the same problem with StAX. I dumped the output of the
url before parsing it, and it seems to be fine and well formed.
But parsing with StAX still gives me an exception right in the first
loop (SAX seems to work fine).
Below is a small test class. Can someone explain to me, why this
happens?
I also tried to copy the output of the url in a file and parsing it
directly from disk ... didn't solve that problem.
Perhaps I should try it with another StAX provider. I found one on the
net named Woodstox. Are there any more? What is the default
implementation? An Apache project?
The error output of the below test class:
START_DOCUMENT: 1.0
beforeNext
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
Message: A '(' character or an element type is required in the
declaration of element type "PubMedPubDate".
at
com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:
588)
at StaxTester.main(StaxTester.java:49)
The test class:
import java.net.URL;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
public class StaxTester {
public static void main(String[] args) {
try {
String address = "http://www.ncbi.nlm.nih.gov/entrez/eutils/
efetch.fcgi?db=pubmed&retmode=xml&id=11748933";
//String address = "http://www.ncbi.nlm.nih.gov/entrez/eutils/
esearch.fcgi?db=pmc&term=stem+cells+AND+free+fulltext[filter]";
URL url = new URL(address);
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser =
factory.createXMLStreamReader(url.openConnection().getInputStream());
while(parser.hasNext()) {
switch(parser.getEventType()) {
case XMLStreamConstants.START_DOCUMENT:
System.out.println( "START_DOCUMENT: " +
parser.getVersion() );
break;
case XMLStreamConstants.END_DOCUMENT:
System.out.println( "END_DOCUMENT: " );
parser.close();
break;
case XMLStreamConstants.NAMESPACE:
System.out.println( "NAMESPACE: " +
parser.getNamespaceURI() );
break;
case XMLStreamConstants.START_ELEMENT:
System.out.println( "START_ELEMENT: " +
parser.getLocalName() );
break;
case XMLStreamConstants.CHARACTERS:
if ( ! parser.isWhiteSpace() )
System.out.println( "CHARACTERS: " + parser.getText() );
break;
case XMLStreamConstants.END_ELEMENT:
System.out.println("END_ELEMENT: " +
parser.getLocalName() );
break;
default:
break;
}
System.out.println("beforeNext");
parser.next();
System.out.println("afterNext");
}
/** SAX succeeds. Why that? */
// SAXParserFactory parserFactory = SAXParserFactory.newInstance();
// parserFactory.setValidating(true);
// parserFactory.setNamespaceAware(true);
// SAXParser parser = parserFactory.newSAXParser();
// parser.parse(url.openConnection().getInputStream(), new
PubmedEFetchHandler());
//
}
catch (Exception e) {
e.printStackTrace();
}
}
}
> Hy!
>
[quoted text clipped - 4 lines]
> The XML document seems to be fine (seehttp://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11...)
> Any suggestions?
...
> String address = "http://www.ncbi.nlm.nih.gov/entrez/
> eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml";
> URL url = new URL(address);
...
> Error message:
> javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
> Message: A '(' character or an element type is required in the
> declaration of element type "PubMedPubDate".
The XML document itself is fine, but non-validating due to problems in
the DTD; StAX by default attempts to validate input documents. SAX is
ignoring the DTD associated with the XML document, and therefore
doesn't notice that the DTD is invalid.
-o
Kai Schlamp - 12 Mar 2008 22:43 GMT
> > Hy!
>
[quoted text clipped - 24 lines]
>
> -o
Thanks for the answer.
So disabling DTD validation should solve that problem?
I tried
factory.setProperty("javax.xml.stream.isValidating", false);
(which is the default as stated in the Javadoc), but it also didn't
solve the problem.
Another thing ... I just tried the Woodstox implementation (just added
it to the classpath), and everything works fine (even without changing
any property). So it seems, that there is a specific problem with the
reference implementation.