Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / Databases / May 2004

Tip: Looking for answers? Try searching our database.

Unicode UTF-16 und JSP

Thread view: 
Manfred Nebel - 29 May 2004 20:45 GMT
Hallo,

ich habe ein Problem mit dem UTF-16 Zeichensatz.
Bis dato habe ich mit:
   BufferedReader in     = new BufferedReader(new
FileReader("daten.txt"),65535);
und einem StringTokenizer Zeilen aus einer ASCII-Datei gelesen, zerhackt und
per ODBC in eine DB geschaufelt.
Jetzt bekomme ich die Daten als UTF-16 Datei.
Aufgrund der Hilfe in der "comp.lang.java.databases" habe ich den
InputStreamReader genommen und folgendes zum Test zusammengestrickt.

BufferedReader in  = new BufferedReader (new InputStreamReader (new
FileInputStream("daten.txt"), "UTF-16"),65535);
BufferedWriter out = new BufferedWriter (new OutputStreamWriter(new
FileOutputStream("data.txt"), "UTF-16"),65535);
while ((zeile = in.readLine()) != null) {
 System.out.println(zeile);
 out.write(zeile);
 StringTokenizer st =  new StringTokenizer(zeile, "\u0009");
 int token = 0;
 while (st.hasMoreTokens()) {
  tok = st.nextToken();
  System.out.println(tok);
  out.write(tok +"\n");
 }

Es funktioniert.
Die Ausgabe auf dem Bildschirm ist zwar -wie zu erwarten- in ASCII, aber die
Ausgabe in die Datei funktioniert in UTF-16, was den Schluß zuläßt, dass
auch in die DB in UTF-16 geschrieben wird, wenn ich die Ausgabe
dementsprechend steuere.

Da ich mittels JSP auf meine Datenbank zugreife, habe ich das Problem, die
von der DB angelieferten Daten in HTML als Unicode darzustellen.
Gibt es eine elegantere Möglichkeit, als ein Array von char-Werten in der
Form    &#<x%=FC%>;     (FCh für "ü") auszugeben um einen String auszugeben
und wieso steht in Selfhtml nichts von 00FC bez. FC00?? sollte die Ausgabe
nicht    &#<x%=FC00%>;   lauten???

Danke vorab

Manfred Nebel
Roedy Green - 29 May 2004 23:55 GMT
>ich habe ein Problem mit dem UTF-16 Zeichensatz.
>Bis dato habe ich mit:
[quoted text clipped - 29 lines]
>Da ich mittels JSP auf meine Datenbank zugreife, habe ich das Problem, die
>von der DB angelieferten Daten in HTML als Unicode darzustellen.
here's the babelfish translation

I have a problem with the UTF-16 character set. To date I have with:
BufferedReader in = new BufferedReader(new FileReader("daten.txt"),
65535); and a StringTokenizer lines read from a ASCII file, zerhackt
and by ODBC into railways dug. Now I get the data as UTF-16 file. Due
to the assistance in "comp.lang.java.databases" I took the
InputStreamReader and together-knitted the following to the test.
BufferedReader in = new BufferedReader (new InputStreamReader (new
FileInputStream("daten.txt"), "UTF 16"), 65535); BufferedWriter out =
new BufferedWriter (new OutputStreamWriter(new
FileOutputStream("data.txt"), "UTF 16"), 65535); while ((line =
in.readLine())! = zero) {System.out.println(zeile); out.write(zeile);
StringTokenizer st = new StringTokenizer(zeile, "\u0009"); int token =
0; while (st.hasMoreTokens()) {tok = st.nextToken();
System.out.println(tok); out.write(tok +"\n"); } It functions. The
expenditure on the screen is like to expecting in ASCII, but the
expenditure into the file functions in UTF-16, which permits the
conclusion that also into the railways in UTF-16 one writes, if I
steer the expenditure accordingly. Since I access by means of JSP my
data base, I have the problem to represent the data in HTML, delivered
by the railways, as university code. There is a more elegant
possibility, than an array of char values in the form & #; (FCh for
"ue") to spend around a stringer spend and why nothing stands from
00FC bez in Selfhtml. FC00?? the expenditure should not & #; read???
Thanks first Manfred fog

Signature

Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.

Roedy Green - 29 May 2004 23:57 GMT
>There is a more elegant
>possibility, than an array of char values in the form & #; (FCh for
>"ue") to spend around a stringer spend and why nothing stands from
>00FC bez in Selfhtml. FC00?? the expenditure should not & #; read???
>Thanks first Manfred fog

I think you are asking how to translate Unicode into &xxx; entities
via table lookup.

See http://mindprod.com/products.html#ENTITIES

Signature

Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.

Manfred Nebel - 30 May 2004 06:48 GMT
I'm realy sorry,

I'd choosen the wrong address. Iwould post it to "comp.lang.java.de'.

Manfred

> Hallo,
>
[quoted text clipped - 39 lines]
>
> Manfred Nebel
Manfred Nebel - 30 May 2004 07:15 GMT
Hallo,
I've a problem with the UTF-16.
Till now, I got the
    BufferedReader in     = new BufferedReader(new
FileReader("daten.txt"),65535);
and a
    StringTokenizer
to read lines from a ASCII-file, cut them and store the tokens into a DB
Now I receive the files in UTF-16.

Now I had tested the following lines
> BufferedReader in  = new BufferedReader (new InputStreamReader (new
> FileInputStream("daten.txt"), "UTF-16"),65535);
[quoted text clipped - 12 lines]
>
> it works

The screen shows ASCII in the DOS-promt - it's normaly - but the
data.txt-file is UTF-16.

While I use JSP, I got the following problem:

When I handle the result set like:
...
    name   = columns.getString( 1);
...
<TD>Name: <%=name%></TD>
the result is in ASCII

ist there a better way like a char-array an a loop with
   &#x<%=FC%>;
or
   &#<x%=FC00%>;

???

Manfred
Roedy Green - 30 May 2004 07:41 GMT
>ist there a better way like a char-array an a loop with
>    &#x<%=FC%>;

If you want to convert Unicode to ASCII with the fancy characters
converted to &xx; use the Entities class.  It goes both ways.

See http://mindprod.com/products.html#ENTITIES

the other way to do it is to set the encoding of the entire document
to  UTF-8 and just send them out as UTF-8 without entity encoding and
let the browser deal with it. Some old browsers may have trouble.

Signature

Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.

Manfred Nebel - 30 May 2004 08:23 GMT
Hi,

> If you want to convert Unicode to ASCII with the fancy characters
> converted to &xx; use the Entities class.  It goes both ways.

> See http://mindprod.com/products.html#ENTITIES

I don't want to translate I will show it on the screen

> the other way to do it is to set the encoding of the entire document
> to  UTF-8 and just send them out as UTF-8 without entity encoding and
> let the browser deal with it. Some old browsers may have trouble.

what I did is:

<%
If (land.equals("CZ") metaLand="2"
%>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8559-<%=metaLand%>">
...
It doesn't fit.

Manfred
Roedy Green - 30 May 2004 08:58 GMT
><%
>If (land.equals("CZ") metaLand="2"
[quoted text clipped - 3 lines]
>...
>It doesn't fit.

What if you tried generating the entire meta tag, rather than just the
last digit of it.  Perhaps the tag parser can't deal with the nested
<>

Signature

Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.