
Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
>>Is there some way I can do this? Some of the encodings I suspect I will
>>come across are UTF-8, windows-1252 and ISO-8859-15, although I do not
[quoted text clipped - 3 lines]
> http://mindprod.com/projects/encodingidentification.html
> for some approaches.
Unfortunately, this didn't help me much.. So I take it that there is no
nifty little class I can download that will do this detection for me?
To clarify, the files I will be working with are _not_ HTML or XML files,
but rather standard-text log files from IM clients.
/Martin Gerner
Roedy Green - 29 Mar 2006 19:05 GMT
On Wed, 29 Mar 2006 13:13:40 +0000 (UTC), Martin Gerner
<martin.gerner@nospam.com> wrote, quoted or indirectly quoted someone
who said :
>Unfortunately, this didn't help me much.. So I take it that there is no
>nifty little class I can download that will do this detection for me?
Exactly. It is a messy problem.

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
Roedy Green - 29 Mar 2006 19:24 GMT
On Wed, 29 Mar 2006 13:13:40 +0000 (UTC), Martin Gerner
<martin.gerner@nospam.com> wrote, quoted or indirectly quoted someone
who said :
>To clarify, the files I will be working with are _not_ HTML or XML files,
>but rather standard-text log files from IM clients.
If you have control over the creating of these files, you could put
the encoding on the front of the file followed by a \n. That would
make your job much easier. Or you could tell everyone to use UTF-8
which would make the problem disappear.
You might also do it by tracking the source of the file. You figure
out manually which encoding each source uses over which date range.
The habit of not recording the encoding goes way back. The idea was
documents were local and all encoded the same way. You did not
exchange documents with others, of if you did, you exchanged a whole
tape full all the same, so again the problem of identification did not
come up.

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.