Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / Databases / February 2008

Tip: Looking for answers? Try searching our database.

Encoding conversion problem

Thread view: 
Andrea - 11 Feb 2008 12:03 GMT
Hi,
I have a J2EE application which connects to a DB2 configured with code
set IBM-850. The application works with encoding ISO-8859-1.
If I save characters outside the range supported by IBM-850 (i.e. the
euro currency character EURO) then I read garbage...

I tried encoding conversions with InputStreamReader and
OutputStreamWriter:
...
BufferedReader reader = new BufferedReader(new
InputStreamReader(source, "IBM850"));
BufferedWriter writer = new BufferedWriter(new
OutputStreamWriter(output, "ISO-8859-1"));
...

but that didn't work...
My JVM Charset.availableCharsets() includes IBM850.

What can I do?

Thanks, in advance,
Andrea
Lothar Kimmeringer - 12 Feb 2008 07:13 GMT
> I have a J2EE application which connects to a DB2 configured with code
> set IBM-850. The application works with encoding ISO-8859-1.

In general the JDBC-driver is aware of the encoding, the database
is using and is doing the conversion already if you access the
column by getString(columnName/index).

> I tried encoding conversions with InputStreamReader and
> OutputStreamWriter:
> ...
> BufferedReader reader = new BufferedReader(new
> InputStreamReader(source, "IBM850"));

What is source? How do you create that from the JDBC-
resultset?

> BufferedWriter writer = new BufferedWriter(new
> OutputStreamWriter(output, "ISO-8859-1"));

That looks OK.

Regards, Lothar
Signature

Lothar Kimmeringer                E-Mail: spamfang@kimmeringer.de
              PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
                questions!

Andrea - 12 Feb 2008 08:25 GMT
Hi Lothar,

> > I have a J2EE application which connects to a DB2 configured with code
> > set IBM-850. The application works with encoding ISO-8859-1.
>
> In general the JDBC-driver is aware of the encoding, the database
> is using and is doing the conversion already if you access the
> column by getString(columnName/index).
Yes I fetch the string with Resultset.getString(index). I use DB2
Universal Driver with a type 4 connection.

> > I tried encoding conversions with InputStreamReader and OutputStreamWriter:
> > ...
[quoted text clipped - 3 lines]
> What is source? How do you create that from the JDBC-
> resultset?

I tried:
InputStream source = new
ByteArrayInputStream(stringFetchedFromDB.getBytes());

Thanks,
Andrea
Lothar Kimmeringer - 14 Feb 2008 19:11 GMT
>> What is source? How do you create that from the JDBC-
>> resultset?
>
> I tried:
> InputStream source = new
> ByteArrayInputStream(stringFetchedFromDB.getBytes());

getBytes() uses the system-encoding for generating the
byte-array. Why do you generate an InputStream anyway?

What you want to do is
OutputStreamWriter osw = new OutputStreamWriter(
 output, "8859_1");
osw.write(resultset.getString("mycolumn"));

BTW: The Euro is not part of ISO-8859-1, so it will get
lost that way anyway.

Regards, Lothar
Signature

Lothar Kimmeringer                E-Mail: spamfang@kimmeringer.de
              PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
                questions!

Sabine Dinis Blochberger - 12 Feb 2008 09:33 GMT
> Hi,
> I have a J2EE application which connects to a DB2 configured with code
> set IBM-850. The application works with encoding ISO-8859-1.
> If I save characters outside the range supported by IBM-850 (i.e. the
> euro currency character EURO) then I read garbage...

Yes, the Euro symbol is not part of the encodings, so your database
can't contain it. If you need it, you would have to change the databases
encoding (ISO-8859-15 includes the Euro symbol).

Otherwise, you have to take care not to try to write unsupported
character into string/character fields.

One solution could be to parse all strings and replace the symbol with
the shorthand "EUR", but it might not be acceptable to your client.
Signature

Sabine Dinis Blochberger

Op3racional
www.op3racional.eu

Andrea - 12 Feb 2008 11:22 GMT
> > ...
> > If I save characters outside the range supported by IBM-850 (i.e. the
> > euro currency character EURO) then I read garbage...
>
> Yes, the Euro symbol is not part of the encodings, so your database
> can't contain it.
I've found a strange thing: C and COBOL application can write and read
(using embedded SQL) characters outside the accepted range without
problems... So the database can contain those characters without
loosing any information, but I can't understand how...

> If you need it, you would have to change the databases
> encoding (ISO-8859-15 includes the Euro symbol).
[quoted text clipped - 3 lines]
> One solution could be to parse all strings and replace the symbol with
> the shorthand "EUR", but it might not be acceptable to your client.
Actually the EURO character is just an example, I have more complex
strings to handle (and I can't change the encoding of the database).
If my problem has no solution at all then I'd like to understand why
other languages don't have this problem...

Thanks,
Andrea
Sabine Dinis Blochberger - 12 Feb 2008 13:02 GMT
> > > ...
> > > If I save characters outside the range supported by IBM-850 (i.e. the
[quoted text clipped - 6 lines]
> problems... So the database can contain those characters without
> loosing any information, but I can't understand how...

Yes, in theory you can store any value (0 - 255 in case of one byte
strings) in a string, but how that is interpreted (i.e. encoding) is
where it gets hairy. Also, multibyte characters would break the
interpretation.

> > If you need it, you would have to change the databases
> > encoding (ISO-8859-15 includes the Euro symbol).
[quoted text clipped - 7 lines]
> If my problem has no solution at all then I'd like to understand why
> other languages don't have this problem...

Ah, there is always hacks around limitations. But they aren't usually
pretty. The problem is to funnel a string with these "unsupported"
characters through the JDBC driver (both ways).

You might get around it by using typeless fields (you can put any byte
sequence there), like BLOBS maybe...

Or you write a parser that substitutes the impossible characters with
acceptable replacements. Of course, this is most likele not feasable.

But the customer has to be aware that a database with encoding X can
only hold strings encoded in X. If they need UTF-8 for example now, they
will eventually have to change their database. And it would be better to
migrate to a suitable encoding than to hack around it and in a few
years, have to do all over again (and then some), when they finally do
want to change the database encoding.

On other languages not having the problem, in C, you can treat a string
just like an array of bytes and use those for whatever you like, the
compiler won't complain. Even interpreting them as memory addresses is
possible, adding and subtracting etc...

> Thanks,
> Andrea

Signature

Sabine Dinis Blochberger

Op3racional
www.op3racional.eu

Andrea - 12 Feb 2008 14:33 GMT
Hi Sabine,
thank you for your explanation, now the overall situation is much more
clear to me.

Thanks,
Andrea
Roedy Green - 12 Feb 2008 18:07 GMT
On Mon, 11 Feb 2008 04:03:47 -0800 (PST), Andrea
<tol7481@iperbole.bologna.it> wrote, quoted or indirectly quoted
someone who said :

>I have a J2EE application which connects to a DB2 configured with code
>set IBM-850. The application works with encoding ISO-8859-1.
>If I save characters outside the range supported by IBM-850 (i.e. the
>euro currency character EURO) then I read garbage...

First, make sure the data are truly encoded in IBM-850.
See http://mindprod.com/applet/encodingrecogniser.html

If there are characters int that file outside the range of IBM-850,
then by definition the file is not encoded in IBM-850 and you SHOULD
expect garbage.

You can write your own translate program to handle the excess chars.

see http://mindprod.com/jgloss/encoding.html

I don't know how to hook it in as an official encoding, but that is
not necessary.
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Roedy Green - 12 Feb 2008 18:10 GMT
On Mon, 11 Feb 2008 04:03:47 -0800 (PST), Andrea
<tol7481@iperbole.bologna.it> wrote, quoted or indirectly quoted
someone who said :

>BufferedReader reader = new BufferedReader(new
>InputStreamReader(source, "IBM850"));
>BufferedWriter writer = new BufferedWriter(new
>OutputStreamWriter(output, "ISO-8859-1"));

Your first task is to find out just what you are being handed before
you start fooling around with translations.

Unicode,  IBM850, ISO-8859-1, something else?
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Andrea - 13 Feb 2008 11:22 GMT
Hi Roedy,
the database (DB2) has this configuration:
...
Database territory     = US
Database code page     = 850
Database code set      = IBM-850
...

I've exported to a file the content of a table with a CHAR(N) field
containing the EURO currency character, then I've opened the file with
EncodingRecognizer: if I choose IBM850 I see a strange character (like
a small X), if I choose ISO-8859-1 I see a square.

I tried a translation with:

String problematicString = rs.getString(index);
problematicString = new String(problematicString, "IBM850"); // Am I
correct?

but I still get garbage :-(

Thanks,
Andrea
Silvio Bierman - 13 Feb 2008 11:36 GMT
> Hi Roedy,
> the database (DB2) has this configuration:
[quoted text clipped - 19 lines]
> Thanks,
> Andrea

Those are quite interesting database configuration parameters. Sounds
like a pre-Unicode setup to me...

Silvio Bierman
Silvio Bierman - 13 Feb 2008 11:48 GMT
> Hi Roedy,
> the database (DB2) has this configuration:
[quoted text clipped - 19 lines]
> Thanks,
> Andrea

You should look at the numeric byte values in problematicString. That
could give you an idea of what you are dealing with although it will
only disclose what your JDBC driver has made of it. It might have
already done an incorrect interpretation of a byte sequence.
Things could be even worse, the data could already have been mutilated
during insertion in the database when some program (possibly the same
program + JDBC driver?) put the data in. If the database encoding does
not support all characters that where in the original data then that is
what most likely happened.

Really, as I said in my previous post you should consider (a) going to a
different database that supports Unicode (b) refrain from
using/supporting non ASCII characters in your application or (c) do what
others have suggested and do your own translation from Unicode -> ASCII
-> DB -> ASCII -> Unicode. The latter option is only realistic if you
have wrapped all your JDBC code in some generic wrappers (which is
usually a good idea) so you can handle this locally.

Good luck,

Silvio
Andrea - 13 Feb 2008 14:22 GMT
Hi Silvio,
the settings are taken from the DB2 instance of a customer (and I
can't change them). The very same code works, of course, without
problems with a DB2 instance configured with ISO-8859-1.

The problem arises also when a C program stores to DB a string with
non-IBM850 valid characters: another C program can read the string
without problems while Java can't; so the string is not corrupted when
saved to DB but someone (JDBC driver? Java I/O?) looses something when
I read the field with Resultset.getString(int index) and I can't
convert it correctly (or I haven't found the right way to do it yet,
if it exists...).

BTW: a test has been made on DB2 with a table with a field declared
CHAR(n) FOR BIT DATA and Java code works without problems reading and
writing non-IBM850 characters.

Having read your feedback (THANK YOU EVERYONE!) I would say that
there's no way to read back those characters in Java in my
application.

Thanks again,
Andrea
Silvio Bierman - 13 Feb 2008 15:39 GMT
Hello Andrea,

> Hi Silvio,
> the settings are taken from the DB2 instance of a customer (and I
> can't change them). The very same code works, of course, without
> problems with a DB2 instance configured with ISO-8859-1.

I already expected you could not change this but it was worth suggesting...

I do not really understand why a Euro sign would work with 8859-1 since
that does not contain that character as far as I am aware of.

> The problem arises also when a C program stores to DB a string with
> non-IBM850 valid characters: another C program can read the string
[quoted text clipped - 3 lines]
> convert it correctly (or I haven't found the right way to do it yet,
> if it exists...).

It will probably depend on how you access the DB from C (which I have
never done with DB2) but it does not sound surprising that the C binding
would just pass through byte sequences. As long as you stay inside 8-bit
character encoding and always interpret them in the same codepage then
this usually works (which is why many programmers are totally unaware of
character encoding issues).

Since Java programs handle strings as sequences of characters from the
Unicode character set all interfaces with external character storage
needs to be done encoding aware. This means that when doing input Java
String objects are created from byte sequences ALWAYS assuming an
encoding, if not explicitly specified defaulting to the platform
default. The other way around String objects are converted to byte
sequences when doing output as well ALWAYS using an encoding, again
defaulting to the platform default.

> BTW: a test has been made on DB2 with a table with a field declared
> CHAR(n) FOR BIT DATA and Java code works without problems reading and
> writing non-IBM850 characters.

That makes sense. What I said before means that if you do not specify
different encodings the same default will be used for input and output.
If the storage medium leaves the intermediate bytes alone (which any
binary database column type would do, just like a binary file would)
then output will again match input, again allowing encoding unaware
programmers to get away with it.

> Having read your feedback (THANK YOU EVERYONE!) I would say that
> there's no way to read back those characters in Java in my
> application.

The problem is probably that the default encoding in Java turns your
Unicode character into something the database is unwilling to store as
is in this encoding and therefore mutilates trying to fix something.
When queried the database then returns a byte sequence different from
what was put in.
After that modification the JDBC driver has no chance of restoring the
original value.

> Thanks again,
> Andrea

Best regards,

Silvio
Roedy Green - 13 Feb 2008 16:38 GMT
On Wed, 13 Feb 2008 16:39:59 +0100, Silvio Bierman
<sbierman@jambo-software.com> wrote, quoted or indirectly quoted
someone who said :

>I do not really understand why a Euro sign would work with 8859-1 since
>that does not contain that character as far as I am aware of.

You could do an experiment.  Try feeding your database all possible
unicode chars in a set of 1-char records, and see which ones come back
unmangled.  This is a kludge, but you could preconvert your Euro to
one of those invariant unused chars.
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Andrea - 14 Feb 2008 16:00 GMT
Hi everyone,
sorry for my previous double-post (a mistake).

>Is is possible to ask the database driver to do the conversions for
>you?  Perhaps internally it is Unicode or some other encoding that can
>deal with Euros.
I've checked the properties of the JDBC driver I use (http://
publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/
com.ibm.db2.udb.doc/ad/rjvdsprp.htm) but there's nothing concerning
encoding conversions.

>We have the clue that C++ programs seem to store  euro s and get them back out.
Yes we have C and COBOL programs that can store and write non-IBM850
chars without problems too.
As pointed out by Sabine in her post the reason may be that C programs
work with the pure sequences of bytes, without performing any encoding
conversion.

>>I do not really understand why a Euro sign would work with 8859-1 since
>>that does not contain that character as far as I am aware of.

SORRY SORRY SORRY SORRY SORRY
I tried to insert (through JDBC) the EURO character in a DB2
configured with
...
Database territory    = C
Database code page    = 819
Database code set     = ISO8859-1
...
and I can't neither write nor read in Java the EURO character
correctly :-(
A COBOL program works instead correctly.

Then I tried the same thing on a SQL-Server 2000 instance with
collation compatibility_51_409_30003 (correponding to a 1252 codepage,
i.e. Latin 1) and I can store and read the EURO character via
Java&JDBC.

That doesn't work in Java with Oracle 10g configured with
...
NLS_LANGUAGE         = AMERICAN
NLS_TERRITORY        = AMERICA
NLS_CHARACTERSET     = US7ASCII
NLS_LENGTH_SEMANTICS = BYTE
...
store&read through COBOL is ok, and in Java I can even write&read
accented vowels... even if those characters are outside USASCII7...

>You could do an experiment.  Try feeding your database all possible
>unicode chars in a set of 1-char records, and see which ones come back
>unmangled.  This is a kludge, but you could preconvert your Euro to
>one of those invariant unused chars.
The EURO character is just an example and part of the problem, I can't
use this type of kludges.
The specific problem is much more complex: a password is crypted and
stored to DB with a C program but the crypted chars fall outside
IBM850 range and in Java I'm unable to read and decrypt back the
string... this works if the database is ISO-8859-1 (that's why I
though I were able to write another 'weird' char, the euro char, on an
ISO-8859-1 DB, sorry...). I've also the more general problem of data
entry: I don't know wich characters users will insert so I can't
substitute chars.
I've found a workaround for my crypting problem but I'm just trying to
understand the reason of the problem.

Now it's clear to me that with a CHAR field Java performs an encoding
conversion using the encodings of the JVM and of the DBMS: if some
characters fall outside the destination encoding then they are lost
(i.e. converted in something completely different).
The only 'mysterious' thing for me now is the behavior on Oracle (JDBC
can read&write accented vowels even if they are outside ascii7)... any
idea? Maybe the Oracle driver is smarter than the DB2 Universal
Driver...

Thanks everyone,
Andrea
Silvio Bierman - 14 Feb 2008 20:44 GMT
> Hi everyone,
> sorry for my previous double-post (a mistake).
[quoted text clipped - 72 lines]
> Thanks everyone,
> Andrea

Hello Andrea,

Even if you set a database encoding to ASCII it is very unlikely that
the DB will strip non-ASCII characters. Actually, most databases treat
every  byte-size (ie 8-bit) encoding almost identically internally. They
may sometimes have different default collations but that is about it.
The codepage attribute is mostly important for programs interfacing with
the DB. As most of those (especially older ones) are encoding unaware
also bytes pass in and out inharmed. In the end all 8-bit encodings are
equal until actually interpreted to represent characters, aren't they?

I have seen application running on cp-1252 platforms using 8859-1
encoded databases for years without anyone noticing. Same for cp-1257 on
a cp-1252 database. Nobody realy cares when the same data that was put
in comes out again.

This is not unlike SMTP which is supposed to be 7-bit only but since the
transport encoding passes 8-bit characters freely people are used to
sending non-ascii characters in plain-text emails although this is not
supported. This all works great until someone from Lithuania sends me an
email (I am in the Netherlands).

Regards,

Silvio
Andrea - 15 Feb 2008 11:26 GMT
Hi Lothar,
>> I tried:
>> InputStream source = new
>> ByteArrayInputStream(stringFetchedFromDB.getBytes());
>getBytes() uses the system-encoding for generating the
>byte-array. Why do you generate an InputStream anyway?
I was just a "desperate" attempt... :-)
I tried almost everything...

>What you want to do is
>OutputStreamWriter osw = new OutputStreamWriter(output, "8859_1");
>osw.write(resultset.getString("mycolumn"));
doesn't work...

Hi Silvio,
>Hello Andrea,
>Even if you set a database encoding to ASCII it is very unlikely that
>the DB will strip non-ASCII characters.
> ....
Yes now this is clear to me, thanks!

I was thinking only about the DB encoding while the problem is mainly
in the JVM encoding (now it's clear to me that Java can't handle
characters outside the encoding of the JVM, I wasn't thinking about
it, sorry...).

I've made another test: I've exported the content of the table with
the crypted password and I've found that the password I can't decrypt
back contains characters between 0x80 and 0x9F, which are control
characters in ISO-8859-1 and Java
- reads garbage with DB2 configured with IBM850 and JVM ISO-8859-1
- reads correctly with both DB2 and JVM configured with ISO-8859-1

I understand the first behavior but the last point is strange... Java
(with some "magic" :-) is doing the proper conversion if the db is
iso-8859-1 but I can't understand how... I will test it again and let
you know if I find something.

Thanks again everyone!

Andrea
Sabine Dinis Blochberger - 15 Feb 2008 12:02 GMT
> I've made another test: I've exported the content of the table with
> the crypted password and I've found that the password I can't decrypt
[quoted text clipped - 11 lines]
>
> Andrea

I'm guessing, but maybe if the databse tells the JDBC driver it's
ISO-8859-1 *and* your application tells it the same encoding, it won't
bother trying to transform anything...

Signature

Sabine Dinis Blochberger

Op3racional
www.op3racional.eu

Lew - 15 Feb 2008 13:10 GMT
> I was thinking only about the DB encoding while the problem is mainly
> in the JVM encoding (now it's clear to me that Java can't handle
> characters outside the encoding of the JVM, I wasn't thinking about
> it, sorry...).

"The encoding of the JVM" is Unicode-16 with surrogate pairs; every Unicode
character is representable in the JVM, including the Euro character.  There is
no Unicode character that the JVM cannot represent.

Signature

Lew

Andrea - 15 Feb 2008 14:51 GMT
Hi Sabine,
>I'm guessing, but maybe if the databse tells the JDBC driver it's
>ISO-8859-1 *and* your application tells it the same encoding, it won't
>bother trying to transform anything...
Yes that's what I was thinking too... but I tried to change the
encoding of the JVM (tried Cp850, ...) but it keeps on working...

Hi Lew,
>> I was thinking only about the DB encoding while the problem is mainly
>> in the JVM encoding (now it's clear to me that Java can't handle
[quoted text clipped - 3 lines]
>character is representable in the JVM, including the Euro character.  There is
>no Unicode character that the JVM cannot represent.
With "encoding of the JVM" I was referring to the file.encoding
property used by the JVM. If the JVM runs with:
- ISO-8859-1 then I can't read or write the EURO character to DB (it
becomes garbage) and ISO-8859-1 doesn't include that character;
- Cp1252 then I can read and write the EURO character to DB and Cp1252
includes that character.

Andrea
Lew - 15 Feb 2008 15:00 GMT
> Hi Sabine,
>> I'm guessing, but maybe if the databse tells the JDBC driver it's
[quoted text clipped - 17 lines]
> - Cp1252 then I can read and write the EURO character to DB and Cp1252
> includes that character.

I uinderstand my confusion now - it stemmed from the phrase "the encoding of
the JVM".  The JVM itself only uses one encoding; it translates to and from
other encoding on I/O.  So to make sure I understood you correctly, were you
referring to the encoding specified by the I/O call?

Generally if the encoding you specify for I/O is different from the encoding
in your data store, it will cause trouble.  This is not limited to Java.  Over
in the Postgres newsgroups one finds people have trouble with character
encoding from all sorts of platforms, mostly stemming from trying to store
characters in a column that are not part of the specified character encoding
for the DB.  If such things don't match, then problems will hatch.

Signature

Lew

Andrea - 15 Feb 2008 17:02 GMT
Hi Lew,

> I uinderstand my confusion now - it stemmed from the phrase "the encoding of
> the JVM".  The JVM itself only uses one encoding; it translates to and from
> other encoding on I/O.  So to make sure I understood you correctly, were you
> referring to the encoding specified by the I/O call?
Maybe I don't understand :-(
In my posts I tried to specify the encoding of the DBMS and the JVM
encoding (i.e. the system property file.encoding) in the different
cases and, as you stated, the JVM performs the necessary translations.
In my JDBC calls I don't specify/force any encoding.

> Generally if the encoding you specify for I/O is different from the encoding
> in your data store, it will cause trouble.  This is not limited to Java.  Over
> in the Postgres newsgroups one finds people have trouble with character
> encoding from all sorts of platforms, mostly stemming from trying to store
> characters in a column that are not part of the specified character encoding
> for the DB.  If such things don't match, then problems will hatch.
I disagree...
For our application we keep the DBMS with a fixed encoding and the
application performs the necessary conversions. For instance for
polish installations we use an ISO-8859-1 database and an application
server configured with ISO-8859-2 where we store polish characters
without problems.

Andrea
Silvio Bierman - 15 Feb 2008 17:16 GMT
> Hi Lew,
>
[quoted text clipped - 22 lines]
>
> Andrea

Hello Andrea,

Using an incomplete database encoding that does not match the
application is a dangerous practice that only works by a coincidence. It
is just because the encodings you mention are 8-bit complete that you
can pass in and out 8-bit values untouched, even though they really can
mean something different inside the DB from how you interpret them in
your application.
With little exception it is never a good idea to use a database encoding
other then a complete encoding like UTF-8 (to be honest you should
always use this in a DB if you are left the choice).
If the applications are Java based you will have a perfect match that
way and only when you do stuff like generating emails, plain-text files
or Office documents (shivers) where incomplete encodings (code pages)
come into the game you have to take extra measures. Using the platform
encoding then is usually a good idea unless it is a server application
in which case you have to make an educated guess.

Regards,

Silvio
Roedy Green - 13 Feb 2008 16:36 GMT
On Wed, 13 Feb 2008 06:22:22 -0800 (PST), Andrea
<tol7481@iperbole.bologna.it> wrote, quoted or indirectly quoted
someone who said :

>Hi Silvio,
>the settings are taken from the DB2 instance of a customer (and I
>can't change them). The very same code works, of course, without
>problems with a DB2 instance configured with ISO-8859-1.

Is is possible to ask the database driver to do the conversions for
you?  Perhaps internally it is Unicode or some other encoding that can
deal with Euros.  We have the clue that C++ programs seem to store
euro s and get them back out.

I am puzzled. I thought JDBC always talked to you in Unicode.
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Andrea - 13 Feb 2008 15:28 GMT
Hi Silvio,
the settings are taken from the DB2 instance of a customer (and I
can't change them). The very same code works, of course, without
problems with a DB2 instance configured with ISO-8859-1.

The problem arises also when a C program stores to DB a string with
non-IBM850 valid characters: another C program can read the string
without problems while Java can't; so the string is not corrupted when
saved to DB but someone (JDBC driver? Java I/O?) looses something when
I read the field with Resultset.getString(int index) and I can't
convert it correctly (or I haven't found the right way to do it yet,
if it exists...).

A test has been made on DB2 with a table with a field declared CHAR(n)
FOR BIT DATA and Java code works without problems reading and writing
non-IBM850 characters.

Having read your feedback (THANK YOU EVERYONE!) I would say that
there's no way to read back those characters in Java in my
application.

Thanks again,
Andrea


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.