Java Forum / General / September 2005
Which line number a Node is from
Nicolas Raoul - 06 Sep 2005 14:35 GMT Hello all,
My Java application uses XML files. These files are parsed using DOM. The XML files are usually written by developpers, and may contain errors. In such case, I would like to tell at what line the error is. The problem is that DOM does not allow this: An org.w3c.dom.Node object does not contain any kind of reference to the originating file.
XMLSchema is not specialized enough to detect all the possible errors (conformance rules contained in a database). That's why I must do some error-checking at run-time, in Java.
How can I parse an XML file in Java and still be able to tell at which line number a particular Node is ? Is there any alternative/extension to DOM for this ?
Thanks, Nicolas Raoul. http://nrw.free.fr
jan V - 06 Sep 2005 14:43 GMT > My Java application uses XML files. These files are parsed using DOM. > The XML files are usually written by developpers, and may contain > errors. Some people would argue that this use of XML is broken. XML should be written and read by programs, i.e. computers... not people. Why don't you write a program to generate those XML files, then you can ensure that you don't produce garbage in the first place (cfr. GIGO).
By trying to solve your line number problem, you're trying to address a symptom, not the cause... and what do you prefer at the end of the day? The cause to be tackled, or just tapering over the cracks?
Hemal Pandya - 07 Sep 2005 07:04 GMT > > My Java application uses XML files. These files are parsed using DOM. > > The XML files are usually written by developpers, and may contain [quoted text clipped - 8 lines] > symptom, not the cause... and what do you prefer at the end of the day? The > cause to be tackled, or just tapering over the cracks? I am not trying to hijack this thread into discussions about valid uses of XML, but I have a few of points to make:
- OP didn't say which developpers write the XML. For all you know they could be at the other end of the and he can't force them to write a program to generate the XML.
- Even if there program generates the XML, humans still have to read it. This generated XML will may (will?) have errors in it and having line number information helps.
Nicolas Raoul - 07 Sep 2005 12:26 GMT I believe that XML should be human-readable and human-editable. And if a program encouters something that should never happen, then it should not just crash, but point out the problem in an intelligible way, including the line number information.
It is unfortunate that I dont have the ressources to write a quality generation tool for developers to easily write this kind of XML. But even if I had written one, I have no way to entforce its usage, I cant dictate how developers must edit the XML files.
I cant just hope that every XML file out there will be valid. On the contrary, I must assume some of them will be invalid, and it is my responsability to handle that case.
I dont think GIGO (Garbage In => Garbage Out) is a good thing, so here is yet another new acronym: GIHO (Garbage In => Help Out) ;-)
Thanks for the ideas anyway, Nicolas Raoul
Andrew Thompson - 06 Sep 2005 14:45 GMT > My Java application uses XML files. These files are parsed using DOM. > The XML files are usually written by developpers, and may contain > errors. In such case, I would like to tell at what line the error is. > The problem is that DOM does not allow this: An org.w3c.dom.Node object > does not contain any kind of reference to the originating file. So don't validate them that way.
I had some experiences recently using the Ant xmlvalidate task[1]. Easy peasy when you kick it off from inside an IDE. The error output will allow you to 'double click'/ 'jump to' the line in error.
[1] <http://ant.apache.org/manual/OptionalTasks/xmlvalidate.html>
I'll now clear the floor for the 'I hate XML' crew. ;-)
HTH
 Signature Andrew Thompson physci.org 1point1c.org javasaver.com lensescapes.com athompson.info "I talk of freedom, you talk of the flag. I talk of revolution, you'd much rather brag.." Live 'White Discussion'
jan V - 06 Sep 2005 15:18 GMT > I had some experiences recently using the Ant xmlvalidate > task[1]. Easy peasy when you kick it off from inside an > IDE. The error output will allow you to 'double click'/ > 'jump to' the line in error. Now you've done it... you've proven Nicolas that it's possible to get at those line numbers, since your IDE does it. ;-)
> I'll now clear the floor for the 'I hate XML' crew. ;-) Not in this thread, Josephine.
Roedy Green - 07 Sep 2005 00:39 GMT >[1] <http://ant.apache.org/manual/OptionalTasks/xmlvalidate.html> > >I'll now clear the floor for the 'I hate XML' crew. ;-) His experience confirms my major XML complaint. XML encourages the propagation of invalid files. My key theory in the Abundance database language was to go to extreme lengths to avoid getting any invalid data in the binary files. It makes coding ever so much simpler if you can completely trust your files to contain only valid and complete data.
XML is the antithesis of that approach.
It is the duty of the USER of the XML file to defend himself against error.
This to me is completely illogical. An XML file has one writer and potentially many readers. It should be the WRITER's job to produce a syntactically clean and provably clean file. The only way to do that is with some sort of binary format that can't easily be tampered with a last minute change.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Nicolas Raoul - 07 Sep 2005 12:40 GMT You describe the case of a single actor producing untouchable files that are then read by some users. That is not the case in my situation. There are many writers, many readers, and I have no control over them. This kind of situation has become pretty common, and I guess that's why XML has become a standard for data exchanges.
> It makes coding ever so much simpler if you can completely trust your files This is definitely what I dont want to do. I will always check every input, to make the application secure and robust in whatever environment it is asked to run.
Thanks anyway, Raoul Nicolas.
Nicolas Raoul - 07 Sep 2005 12:31 GMT This ant task is useful, it uses a DTD or XMLSchema definition. However, it is not applicable here, since in my particular case, XML validity cant be expressed by these languages.
XML validity in my case depends on environment factors. An XML file that is valid in one system is probably invalid in another system. That's because it depends on constraints that are stored in a database. XMLSchema cant access a database, as far as I know :-(
> you've proven that it's possible to get at those line numbers Hopefully, XMLSchema validators tell errors line numbers, probably because they are not written using DOM.
Thanks, Nicolas Raoul
bugbear - 06 Sep 2005 15:01 GMT > Hello all, > [quoted text clipped - 11 lines] > line number a particular Node is ? > Is there any alternative/extension to DOM for this ? No. So I wrote one; I use a SAX parser and a handler to build up a DOM tree. And I fake/force/fudge extra line number information (from the SAX events) into the DOM nodes.
It's a java equivalent of this: http://search.cpan.org/~enno/libxml-enno-1.02/lib/XML/Handler/BuildDOM.pm
It's foul, I know, but I have happy users.
BugBear
Thomas Hawtin - 06 Sep 2005 15:36 GMT >> XMLSchema is not specialized enough to detect all the possible errors >> (conformance rules contained in a database). That's why I must do some [quoted text clipped - 9 lines] > > It's foul, I know, but I have happy users. It might be easier to do the validation in SAX, rather than waiting for the DOM.
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Nicolas Raoul - 07 Sep 2005 12:45 GMT Well, I could parse everything using SAX, but I really dont like SAX. DOM is much more adapted to object-oriented programming in my opinion.
For example, an object may parse a Node, recognize some known nodes inside and pass them to appropriate new objects. Each object knows how to parse its node, rather than having a big SAX class that does everything and would be less extensible.
Thanks for the idea, Nicolas Raoul
Nicolas Raoul - 07 Sep 2005 12:41 GMT It sounds interesting :-) I am thinking about writing such a tool. Is your DOM builder open source, or otherwise available somewhere ?
Thanks a lot ! Nicolas Raoul.
bugbear - 08 Sep 2005 13:35 GMT > It sounds interesting :-) > I am thinking about writing such a tool. > Is your DOM builder open source, or otherwise available somewhere ? No. But I suggest a transcription of the perl I linked to might not be too hard.
BugBear
flazzarino@gmail.com - 06 Sep 2005 15:47 GMT use xerces, i think it might be in the sdk 5.0 too. validation is not as hard as it seems.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema"); factory.setAttribute("http://xml.org/sax/features/validation", new Boolean(true)); factory.setAttribute("http://apache.org/xml/features/validation/schema", new Boolean(true)); factory.setAttribute("http://xml.org/sax/features/namespaces", new Boolean(true));
bulder = factory.newDocumentBuilder(); ErrorHandler eh = new ErrorHandler() { public void warning(SAXParseException e) throws SAXException { // do stuff with e, like e.getLineNumber() } public void error(SAXParseException e) throws SAXException { // do stuff with e }
public void fatalError(SAXParseException e) throws SAXException { // do stuff with e } };
bulder.setErrorHandler(eh); builder.parse(someInputStream)
Nicolas Raoul - 07 Sep 2005 12:46 GMT Your code uses XMLSchema-defined validity rules. Please have a look at the answer I have just written for Andrew : XMLSchema is not expressive enough to match my needs.
Thanks anyway :-) Nicolas Raoul.
Hemal Pandya - 07 Sep 2005 13:08 GMT > Hello all, > > How can I parse an XML file in Java and still be able to tell at which > line number a particular Node is ? > Is there any alternative/extension to DOM for this ? Some time spent searching revealed that this can be done by extending DOMParser. The Xerces samples include DOMAddLines, which prints line numbers for each node while using a DOM Parser. You can probably use Proxy to do this with arbitrary DOMParser classes.
I was able to find source by looking for http://www.google.com/search?hl=en&lr=&q=DOMAddlines+ext%3Ajava&btnG=Search and viewing from google cache.
Nicolas Raoul - 07 Sep 2005 17:11 GMT Great ! That is exactly the kind of solution I need :-)
The Apache sample Hemal linked to is an extension to DOMParser that overrides the startDocument() and startElement() methods to store the XMLLocator.getLineNumber() into the userData attribute of each Node.
Since DOMParser is org.apache.xerces.parsers.DOMParser, it seems that this is an implementation specific solution. Well, it seems that getting the line numbers is impossible with the standard J2SE API, so it has to be a specific implementation. Anyway I am already using xerces.
I am still using Java 1.4 (which is a shame, I know...) and unfortunately in 1.4, userData are not available. This will probably lead me to create for each node a sub-node (or attribute) to store its line number, a solution similar to what BugBear suggested.
I will try to write something reusable for this, and I will let you know.
Thanks a lot ! Nicolas Raoul.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|