Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / Databases / January 2007

Tip: Looking for answers? Try searching our database.

String exceeding length - Getting absolute string length

Thread view: 
james.w.appleby@gmail.com - 09 Jan 2007 12:34 GMT
Hello,

I am having a problem when inputting very long strings into a database.
The application I am writing can use different databases (thanks to
the wonders of JDBC) so this issue has been causing problems on both
Oracle and SQL Server.

Because one of the design objects was to support any JDBC compatible
database, a concern was raised about text widths.  It was therefore
decided that the maximum column width for a VARCHAR would be a
configurable value.  We theoretically knew that data could be more than
a single line so we  introduced a sequence number to allow multiple
rows.  (Don't ask me why we didn't use CLOBs instead, this is the
schema I'm stuck with.)

We now need to store base64 data in the same fields.  The problem is
that in an example 4000 characters as defined by the Java string
object, its physical size is approximently 4430.  This seems to be
because of the amount of mark-up involved, either in the base64 data or
possibly with the text between.

It occurs to me that while a non-ASCII value many be only a single
character in a unicode string, it is 6 characters in UTF-8.  Therefore
I'm looking for a way of calculates the absolute length, rather than a
count of characters.

Is this possible or will I have to change the schema?
Hybris - 09 Jan 2007 13:36 GMT
Il Tue, 09 Jan 2007 04:34:45 -0800, james.w.appleby ha scritto:

> I'm looking for a way of calculates the absolute length, rather than a
> count of characters.

see String method getBytes
Ian Wilson - 10 Jan 2007 10:38 GMT
> It occurs to me that while a non-ASCII value many be only a single
> character in a unicode string,

I think you mean the opposite, that an ASCII (not non-ASCII) character
will be represented in UTF-8 using a single *byte*.

> it is 6 characters in UTF-8.  

No it isn't. UTF-8 uses a *variable* number of *bytes* for one Unicode
character.

> Therefore
> I'm looking for a way of calculates the absolute length, rather than a
> count of characters.

String has a getBytes() method for this purpose.
Oliver Wong - 10 Jan 2007 22:01 GMT
>> It occurs to me that while a non-ASCII value many be only a single
>> character in a unicode string,
[quoted text clipped - 6 lines]
> No it isn't. UTF-8 uses a *variable* number of *bytes* for one Unicode
> character.

   And even then, UTF-8 only ranges from 1 to 4 octects. The values start
at 0x000000 and go to 0x10FFFF.

   - Oliver
John W. Kennedy - 11 Jan 2007 00:07 GMT
>>> It occurs to me that while a non-ASCII value many be only a single
>>> character in a unicode string,
[quoted text clipped - 7 lines]
>     And even then, UTF-8 only ranges from 1 to 4 octects. The values start
> at 0x000000 and go to 0x10FFFF.

CESU-8 and Java's "Modified UTF-8" use as many as six, because they
first encode characters above U+FFFF as UTF-16, and then UTF-8 encode
the result. "UTF-8", albeit wrongly, is often taken to include one or
both of those schemes, so the incorrect figure of 6 is often encountered.

Signature

John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
  -- Charles Williams.  "Taliessin through Logres: Prelude"

Manfred Rosenboom - 10 Jan 2007 15:37 GMT
Hi James,

Maybe the following Sun Tech Tip is worth reading by you:

Tech Tip #1: How long is your String object?
http://java.sun.com/mailers/techtips/corejava/2006/tt0822.html#1

Best,
Manfred


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.