Hi folks,
Do you know if there is a way to automaticly detect the charset from a
bytes array ? In fact, I would like to decode a byte array, with the
good charset interpretor, given that I do not know which charset was
used to encode it.
The CharsetDecoder class seems to have a "isAutoDetecting" boolean
method : this means that there should exists a 'generic' charset
decoder implementation which could auto detect the charset. Am I right
?
Any suggestion would be appreciated,
Thanks folks !
Antoine Larcher
Alan Moore - 06 May 2005 20:16 GMT
>Hi folks,
>
[quoted text clipped - 7 lines]
>decoder implementation which could auto detect the charset. Am I right
>?
Unfortunately, that auto-detect feature is very limited. If you know
you're reading Chinese text, but don't know which of the several
Chinese encodings it was written in, you can use an auto-detecting
"wrapper" Charset that figures it out for you. I think there's one
for Japanese text as well, but there's no built-in universal
auto-detecting Charset.
I use this tool:
http://glaforge.free.fr/wiki/index.php?wiki=GuessEncoding
It only works with a limited set of Unicode and Western encodings, but
it's perfect for my needs. If you need something with broader
applicability, look for the CharDet package from Mozilla.