Hello everybody.
I have a problem with fatal error while parsing XML.
I have a server and a client.
My server creates XML from web page given by the client, after parsing
it to
XML the content is being sent to client.
this is client code:
// read text from socket
while (null != line)
{
sb.append(line);
line = br.readLine();
}
// debug - this works I can see my XML response!
// System.out.println(sb.toString());
// parse my String back to DOM Document
DocumentBuilder xdb2 = XMLParserUtils.getXMLDocBuilder();
ByteArrayInputStream bais = new
ByteArrayInputStream(sb.toString().getBytes());
Document doc = xdb.parse(new InputSource(bais));
and then I recieve this fatal error:
[Fatal Error] :1:1335: Invalid byte 1 of 1-byte UTF-8 sequence.
org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.
at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:264)
at
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:292)
How can I avoid this problems?
Should I encode (how?) text sent through sockets?
thanks in advance for Your help
best regards
R
> Hello everybody.
>
[quoted text clipped - 10 lines]
>
> How can I avoid this problems?
Sounds like invalid XML. Do you have the "<?xml version='1.0'?>" element
at the start of your data?
Hope that helps,
Ross

Signature
[Ross A. Bamford] [ross AT the.website.domain]
Roscopeco Open Tech ++ Open Source + Java + Apache + CMF
http://www.roscopec0.f9.co.uk/ + info@the.website.domain
R - 09 May 2005 21:16 GMT
the thing is that the String that is read from socket has XML prolog:
<?xml version="1.0" encoding="UTF-8"?>
any idea what should I do?
may it be the encoding?
thanks in advance
best regards
R
R - 09 May 2005 21:23 GMT
hm...
I think that encoding is broken (but I don't know how to fix it)
XML is in UTF-8
// debug
/ /System.out.println(sb.toString());
ByteArrayInputStream bais = new
ByteArrayInputStream(sb.toString().getBytes());
Document doc = xdb.parse(new InputSource(bais));
sb.toString() - maybe this is why xdb.parse(new InputSource(bais));
raises fatal error?
(maybe UTF-8 is converted to polish ISO-8859-2?)
am I right?
If so - how can it be fixed? (I'm newbie and not quite familiar with
Java)
thanks for Your help
best regards
R
Ross Bamford - 09 May 2005 21:47 GMT
> I think that encoding is broken (but I don't know how to fix it)
Hmm, It's possible... Firstly, looking back at your first message i
notice you grabbed a docbuilder to 'xdb2' but then parsed with 'xdb' - I
assume this was a typo (otherwise check this up!)?
Without seeing more of your code, I'm not sure where you're getting the
data from (a socket I think you said?). If so, why not just pass in the
original InputStream to the parser? There is a parse(InputStream)
override that should correctly handle your encoding. If you have text
input there are decorators in java.io that will help.
Generally speaking you don't really want to convert things into bytes
unless you really need to - leave that to the lower level code in the
JDK (et al.) which has advanced support for encodings :)
Apart from that, back to my first suggestion - strip your input down to
the bare minimum and see if that helps.
Cheers,
Ross

Signature
[Ross A. Bamford] [ross AT the.website.domain]
Roscopeco Open Tech ++ Open Source + Java + Apache + CMF
http://www.roscopec0.f9.co.uk/ + info@the.website.domain
A. Bolmarcich - 09 May 2005 23:07 GMT
> hm...
>
[quoted text clipped - 13 lines]
>
> am I right?
The expression sb.toString().getBytes() uses the default encoding,
which for you may be ISO-8859-2.
> If so - how can it be fixed? (I'm newbie and not quite familiar with
> Java)
Chances are the encoding declaration of the XML declaration is UTF-8
(implicitly or explicity). Create the ByteArrayInputStram by using
the expression sb.toString().getBytes("UTF-8") so that the bytes are
the UTF-8 encoding of the Unicode characters of sb.toString().
> Hello everybody.
>
[quoted text clipped - 13 lines]
> line = br.readLine();
> }
I really don't see the reason why you first read in all data into a
thing and then start to feed that string into the parser. Why don't you
pars the data directly. I also don't see the reason why you want to go
to the sequence of
byte data from socket
-> text encoding to String(Buffer)
-> back to byte data for XML parser
-> wrapped as an InputSource
Non of this is necessary. You can provide the InputStream from the
Socket to the XML parser.
But if you really want to first read all the data:
> // debug - this works I can see my XML response!
> // System.out.println(sb.toString());
[quoted text clipped - 3 lines]
> ByteArrayInputStream bais = new
> ByteArrayInputStream(sb.toString().getBytes());
From the String.getBytes() documentations:
| Encodes this String into a sequence of bytes using the
| platform's default charset, storing the result into a new byte array.
^^^^^^^^^^^^^^^^^^^^^^^^^^
Is your platform's default charset UTF-8? I doubt it. You want to have
the getBytes(String charsetName) method instead. But ...
> Document doc = xdb.parse(new InputSource(bais));
... did you recognize that InputSource can directly read from a String?
/Thomas

Signature
The comp.lang.java.gui FAQ:
ftp://ftp.cs.uu.nl/pub/NEWS.ANSWERS/computer-lang/java/gui/faq