Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / June 2007

Tip: Looking for answers? Try searching our database.

Problem using local xhtml DTD when parsing file with DocumentBuilder

Thread view: 
Ryan McFall - 13 Jun 2007 17:45 GMT
Hi:

I've got some XHTML documents that I'm using the classes in
java.xml.xpath to find certain tags.  These documents contain a DTD
declaration for XHTML, with a public identifier.  Since my application
needs to work without a network connection, I've downloaded the DTD
and associated entities and made them available to my application as
resources.  I then set an EntityResolver the document builder that I
get from DocumentBuilderFactory.newInstance().  Here's the relevant
code from the resolveEntity method:

url = getClass().getResource (identifierMap.get(publicId));
return new InputSource (url.toString());

When I run the application, I get the following message from the
parser:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
Invalid byte 1 of 1-byte UTF-8 sequence.

After browsing around a bit, I tried:

url = getClass().getResource (identifierMap.get(publicId));
FileReader reader = new FileReader (new File (url.toURI()));
return new InputSource (reader);

but this had the same problem.

I downloaded the files from the W3C site, both by using FireFox and by
using wget.  In both cases I get the same behavior.

I don't know much about character encodings, so I'm at a loss as to
what to try next.  Any suggestions would be greatly appreciated.

Ryan
Lew - 13 Jun 2007 18:05 GMT
> Hi:
>
[quoted text clipped - 3 lines]
> com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
> Invalid byte 1 of 1-byte UTF-8 sequence.

Ideally, all XML documents should be in UTF-8 encoding.  Apparently the DTD or
your XML file isn't.  When they aren't, the XML declaration should specify the
encoding.

> After browsing around a bit, I tried:
>
[quoted text clipped - 3 lines]
>
> but this had the same problem.

Have you considered using
<http://java.sun.com/javase/6/docs/api/java/io/InputStreamReader.html#InputStream
Reader(java.io.InputStream,%20java.nio.charset.Charset
)>
?

This will let you specify the document encoding to match how it's stored.

Signature

Lew

Ryan McFall - 13 Jun 2007 18:08 GMT
Pardon my stupidity - the XML file was saved by someone else, and
apparently it was saved as something other than UTF-8.  Re-saving it
into UTF-8 solved my problem.

Ryan


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.