Frank Meyer wrote on 15.08.2007 14:04:
>> File f = new File("sourcfile.txt");
>> Reader r = new InputStreamReader(new FileInputStream(f), "UTF-8");
[quoted text clipped - 5 lines]
> http://java.sun.com/mailers/techtips/corejava/2007/tt0207.html#1
> where this issue may be clarified.
Hi Frank
thanks for the answer, but I don't understand the relation to my question.
If I understand the article correctly, it describes how I can remove (normalize)
problematic characters from a String. But in my case I need to store the
contents of a text file un-altered into a database CLOB field. But to do that I
need to know the number of characters according to a given encoding in a file,
without reading the file into memory.
I can't seem to get the relation to normalizing a text input here...
Thomas
Philipp Taprogge - 15 Aug 2007 22:20 GMT
Hi!
> But in my case I need to store the contents of a text file un-altered into a database CLOB
> field. But to do that I need to know the number of characters according
> to a given encoding in a file, without reading the file into memory.
Hmm... I'd say, you don't want to store "text" at all. What you are trying to
do is store arbitrary data, regardless of it's encoding, in the database and
let the client worry about producing a readable representation.
I'd say that's exactly what a BLOB is for. You should store the file as binary
data, possibly detecting and storing it's encoding alongside and then when
reading it from the DB again, produce appropriate output from it.
A CLOB would only make sense if you wanted the database to do anything "texty"
with that data. But if you can't choose and stick to a certain encoding
beforehand, that would be extremely difficult.
Am I missing something...?
Regards,
Phil
Thomas Kellerer - 16 Aug 2007 07:54 GMT
Hello Philipp,
> Hi!
>
[quoted text clipped - 13 lines]
>
> Am I missing something...?
Yes and no :)
This is a generic SQL GUI where I support uploading of text files into a
CLOB field (BLOBs are not problem at all). To give you an idea, the
syntax is:
INSERT INTO some_table (col1, clob_col)
VALUES
(1, {$clobfile='test.txt' encoding='UTF-8'});
This is an "extension" so that the end-user can handle file "uploads"
from the client into LOB fields (a similar extension without encoding is
available using {$blobfile=}
So I have no control whatsoever about the data model or the JDBC driver
used in this context. As I said in my initial posting, all (major)
drivers except Derby seem to be able to handle setCharacterStream() with
a length that is bigger than the actual number of characters.
"Counting" the number of characters in the input file using a Reader
does work, and the overhead for a single INSERT isn't that problematic
(not even for larger files). But when it comes to bulk uploads this
might be a problem. And as my current implementation does not comply
with the JDBC definition, I'm trying to find a portable and correct way
to implement it.
Regards
Thomas