Hallo,
ich habe ein Problem mit dem UTF-16 Zeichensatz.
Bis dato habe ich mit:
BufferedReader in = new BufferedReader(new
FileReader("daten.txt"),65535);
und einem StringTokenizer Zeilen aus einer ASCII-Datei gelesen, zerhackt und
per ODBC in eine DB geschaufelt.
Jetzt bekomme ich die Daten als UTF-16 Datei.
Aufgrund der Hilfe in der "comp.lang.java.databases" habe ich den
InputStreamReader genommen und folgendes zum Test zusammengestrickt.
BufferedReader in = new BufferedReader (new InputStreamReader (new
FileInputStream("daten.txt"), "UTF-16"),65535);
BufferedWriter out = new BufferedWriter (new OutputStreamWriter(new
FileOutputStream("data.txt"), "UTF-16"),65535);
while ((zeile = in.readLine()) != null) {
System.out.println(zeile);
out.write(zeile);
StringTokenizer st = new StringTokenizer(zeile, "\u0009");
int token = 0;
while (st.hasMoreTokens()) {
tok = st.nextToken();
System.out.println(tok);
out.write(tok +"\n");
}
Es funktioniert.
Die Ausgabe auf dem Bildschirm ist zwar -wie zu erwarten- in ASCII, aber die
Ausgabe in die Datei funktioniert in UTF-16, was den Schluß zuläßt, dass
auch in die DB in UTF-16 geschrieben wird, wenn ich die Ausgabe
dementsprechend steuere.
Da ich mittels JSP auf meine Datenbank zugreife, habe ich das Problem, die
von der DB angelieferten Daten in HTML als Unicode darzustellen.
Gibt es eine elegantere Möglichkeit, als ein Array von char-Werten in der
Form &#<x%=FC%>; (FCh für "ü") auszugeben um einen String auszugeben
und wieso steht in Selfhtml nichts von 00FC bez. FC00?? sollte die Ausgabe
nicht &#<x%=FC00%>; lauten???
Danke vorab
Manfred Nebel
Roedy Green - 29 May 2004 23:55 GMT
>ich habe ein Problem mit dem UTF-16 Zeichensatz.
>Bis dato habe ich mit:
[quoted text clipped - 29 lines]
>Da ich mittels JSP auf meine Datenbank zugreife, habe ich das Problem, die
>von der DB angelieferten Daten in HTML als Unicode darzustellen.
here's the babelfish translation
I have a problem with the UTF-16 character set. To date I have with:
BufferedReader in = new BufferedReader(new FileReader("daten.txt"),
65535); and a StringTokenizer lines read from a ASCII file, zerhackt
and by ODBC into railways dug. Now I get the data as UTF-16 file. Due
to the assistance in "comp.lang.java.databases" I took the
InputStreamReader and together-knitted the following to the test.
BufferedReader in = new BufferedReader (new InputStreamReader (new
FileInputStream("daten.txt"), "UTF 16"), 65535); BufferedWriter out =
new BufferedWriter (new OutputStreamWriter(new
FileOutputStream("data.txt"), "UTF 16"), 65535); while ((line =
in.readLine())! = zero) {System.out.println(zeile); out.write(zeile);
StringTokenizer st = new StringTokenizer(zeile, "\u0009"); int token =
0; while (st.hasMoreTokens()) {tok = st.nextToken();
System.out.println(tok); out.write(tok +"\n"); } It functions. The
expenditure on the screen is like to expecting in ASCII, but the
expenditure into the file functions in UTF-16, which permits the
conclusion that also into the railways in UTF-16 one writes, if I
steer the expenditure accordingly. Since I access by means of JSP my
data base, I have the problem to represent the data in HTML, delivered
by the railways, as university code. There is a more elegant
possibility, than an array of char values in the form & #; (FCh for
"ue") to spend around a stringer spend and why nothing stands from
00FC bez in Selfhtml. FC00?? the expenditure should not & #; read???
Thanks first Manfred fog

Signature
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Roedy Green - 29 May 2004 23:57 GMT
>There is a more elegant
>possibility, than an array of char values in the form & #; (FCh for
>"ue") to spend around a stringer spend and why nothing stands from
>00FC bez in Selfhtml. FC00?? the expenditure should not & #; read???
>Thanks first Manfred fog
I think you are asking how to translate Unicode into &xxx; entities
via table lookup.
See http://mindprod.com/products.html#ENTITIES

Signature
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Manfred Nebel - 30 May 2004 06:48 GMT
I'm realy sorry,
I'd choosen the wrong address. Iwould post it to "comp.lang.java.de'.
Manfred
> Hallo,
>
[quoted text clipped - 39 lines]
>
> Manfred Nebel
Manfred Nebel - 30 May 2004 07:15 GMT
Hallo,
I've a problem with the UTF-16.
Till now, I got the
BufferedReader in = new BufferedReader(new
FileReader("daten.txt"),65535);
and a
StringTokenizer
to read lines from a ASCII-file, cut them and store the tokens into a DB
Now I receive the files in UTF-16.
Now I had tested the following lines
> BufferedReader in = new BufferedReader (new InputStreamReader (new
> FileInputStream("daten.txt"), "UTF-16"),65535);
[quoted text clipped - 12 lines]
>
> it works
The screen shows ASCII in the DOS-promt - it's normaly - but the
data.txt-file is UTF-16.
While I use JSP, I got the following problem:
When I handle the result set like:
...
name = columns.getString( 1);
...
<TD>Name: <%=name%></TD>
the result is in ASCII
ist there a better way like a char-array an a loop with
&#x<%=FC%>;
or
&#<x%=FC00%>;
???
Manfred
Roedy Green - 30 May 2004 07:41 GMT
>ist there a better way like a char-array an a loop with
> &#x<%=FC%>;
If you want to convert Unicode to ASCII with the fancy characters
converted to &xx; use the Entities class. It goes both ways.
See http://mindprod.com/products.html#ENTITIES
the other way to do it is to set the encoding of the entire document
to UTF-8 and just send them out as UTF-8 without entity encoding and
let the browser deal with it. Some old browsers may have trouble.

Signature
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Manfred Nebel - 30 May 2004 08:23 GMT
Hi,
> If you want to convert Unicode to ASCII with the fancy characters
> converted to &xx; use the Entities class. It goes both ways.
> See http://mindprod.com/products.html#ENTITIES
I don't want to translate I will show it on the screen
> the other way to do it is to set the encoding of the entire document
> to UTF-8 and just send them out as UTF-8 without entity encoding and
> let the browser deal with it. Some old browsers may have trouble.
what I did is:
<%
If (land.equals("CZ") metaLand="2"
%>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8559-<%=metaLand%>">
...
It doesn't fit.
Manfred
Roedy Green - 30 May 2004 08:58 GMT
><%
>If (land.equals("CZ") metaLand="2"
[quoted text clipped - 3 lines]
>...
>It doesn't fit.
What if you tried generating the entire meta tag, rather than just the
last digit of it. Perhaps the tag parser can't deal with the nested
<>

Signature
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.