>> My Problem is that I have to parse an XML file that contains som
>> invalid chars (i.e. 0x0E or 0x1E)
[quoted text clipped - 10 lines]
>
> Read the XML header and get encoding from there.
Which is easy if you know that the XML file is in some superset of
ASCII, since the entrie XML header will then be in ASCII. It's
tricker if the XML file might be in any encoding at all (e.g. EBCDIC,
UTF-16, etc.) In the latter case, look at Appendix F
(http://www.w3.org/TR/REC-xml/#sec-guessing) for some useful tips.
Christian - 11 Jan 2008 11:29 GMT
Mike Schilling schrieb:
>>> My Problem is that I have to parse an XML file that contains som
>>> invalid chars (i.e. 0x0E or 0x1E)
[quoted text clipped - 15 lines]
> UTF-16, etc.) In the latter case, look at Appendix F
> (http://www.w3.org/TR/REC-xml/#sec-guessing) for some useful tips.
Thx for your pointers..
Though the solution seems to be to heavy ... and as I am only expecting
utf-8 and windows-1252 I probably do with the hack of just removing the
bytes ... (and search the api now if there is some easy way to throw an
exception if none of these encodings are used..)
thx