Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / February 2006

Tip: Looking for answers? Try searching our database.

string to ascii on line feed

Thread view: 
donald - 20 Feb 2006 22:06 GMT
Hi there,

i doing some java work and basic i get a String "/n" and i need to get
the ascii value of it which in this case is 10 what is the best way of
going about this?

Thanks

Donald
Oliver Wong - 20 Feb 2006 22:15 GMT
> Hi there,
>
> i doing some java work and basic i get a String "/n" and i need to get
> the ascii value of it which in this case is 10 what is the best way of
> going about this?

   The ASCII code of a string is not well defined, but the ASCII code of a
character is. First, convert your string to a single character (how you do
this depends on what assumptions you can make with respect to the string).
From there, in Java, you can cast a character to an integer, like this:

<code, not tested or compiled>
char myChar = '\n';
int myInt = (int)myChar;
</code, not tested or compiled>

   - Oliver
donald - 20 Feb 2006 22:30 GMT
i am trying to look at a string and determine whether it is a single
character and, as such convert to an integer.  I already understand how
to convert to the integer (using byte rather than int is also better!)
but it seems that escape characters are represented as two chars within
a java string. i.e. "\n" has length two whereas i need it to find it by
length one.  a regular expression:
   Pattern.matches("\n|.", "\n")
does not return true as I would expect.

Any ideas?

donald
Jeffrey Schwab - 20 Feb 2006 22:48 GMT
> i am trying to look at a string and determine whether it is a single
> character and, as such convert to an integer.  I already understand how
[quoted text clipped - 4 lines]
>     Pattern.matches("\n|.", "\n")
> does not return true as I would expect.

"\n" has length 1.  Try printing the result of "\n".length().

You can access individual characters within a string using zero-based
indexes; e.g., to get the third character in string s, use s[2].
Jeffrey Schwab - 20 Feb 2006 22:50 GMT
> i am trying to look at a string and determine whether it is a single
> character and, as such convert to an integer.  I already understand how
[quoted text clipped - 4 lines]
>     Pattern.matches("\n|.", "\n")
> does not return true as I would expect.

"\n" has length 1.  Try printing the result of "\n".length().

Access individual characters within a string using zero-based indexes;
e.g., to get the first character in string s, use s[0].

comp.lang.java.help is probably a better place for this kind of question.
Oliver Wong - 21 Feb 2006 14:32 GMT
>i am trying to look at a string and determine whether it is a single
> character and, as such convert to an integer.  I already understand how
[quoted text clipped - 6 lines]
>
> Any ideas?

   First, read
http://groups.google.ca/group/comp.lang.java.programmer/msg/3fd11f7fb586e837

   Now after having read that, do you mean an in-memory string of length 2
of which the first character is the slash, and the second character is an
'n', or do you mean an in-memory string of length 1 of which the only
character is the newline character?

   - Oliver
tom fredriksen - 21 Feb 2006 00:02 GMT
> Hi there,
>
> i doing some java work and basic i get a String "/n" and i need to get
> the ascii value of it which in this case is 10 what is the best way of
> going about this?

RTMF!

Strings in Java are unicode, so they are 16 bits wide, ascii is 8 bit.
Meaning you have to use Strings class methods to retrieve the individual
character values correctly by looping over the string converting each
character to an integer.

/tom
jeanlutrin@yahoo.fr - 21 Feb 2006 03:08 GMT
...
> Strings in Java are unicode, so they are 16 bits wide,

Strings in Java are Strings.  The primitive char type is based
on Unicode 3.0 and char in Java are hence 16 bits wide, which
is unfortunate since since Unicode 3.1 this is not enough to
represent all Unicode codepoints.

> ascii is 8 bit.

No.

ASCII is a seven-bit code.
Chris Uppal - 21 Feb 2006 09:14 GMT
> Strings in Java are Strings.  The primitive char type is based
> on Unicode 3.0 and char in Java are hence 16 bits wide, which
> is unfortunate since since Unicode 3.1

Small correction (just for historical interest): the Unicode standard abandoned
16-bitness no later than v 2.0.0 published in July '96.

   -- chris
tom fredriksen - 21 Feb 2006 14:50 GMT
> ....
>> Strings in Java are unicode, so they are 16 bits wide,
[quoted text clipped - 3 lines]
> is unfortunate since since Unicode 3.1 this is not enough to
> represent all Unicode codepoints.

Not really the point is it.

Java string are based on Unicode which in java is based on UTF-16, so
strings in java are 16 bit wide. The fact that the underlying primitive
type is char which is based on UTF-16 is irrelevant for this discussion.

>> ascii is 8 bit.
>
> No.
>
> ASCII is a seven-bit code.

No, US-ASCII is 7 bit, ASCII is 8 bit. The fact that you distinguish
between ascii from 1967 and the current definition of ascii is
interresting only if you are using Bells teleprinter.

I hate language lawyers:(

/tom
jeanlutrin@yahoo.fr - 21 Feb 2006 21:19 GMT
> No, US-ASCII is 7 bit, ASCII is 8 bit.

That is plain wrong.  Even more scarier: so many semi-knowledgeable
programmer get this wrong that it *is* definitely a very common
source of endless bugs and misconceptions.

ASCII is 7 bit. Get over with it.

Now, you find me a string that gives me a byte over 127 by using
the following method, will you?

final String s = ...;
final byte[] missionImpossible = s.getBytes("ASCII");

Hint: "US-ASCII" and "ASCII" is the same for Sun, as it is for
anybody familiar with this concept. And ASCII *fscking* is 7 bit.
Which is why you will *not* give me an "ASCII byte" above 127.
There's no such thing. How do you want me to explain it ?

Now it is *you* who should "RTMF!" (sic) as you wrote it in your
first (false) post.

You'll also be nice and explain how comes both ASCII and ISO-8859-1
are what is called "code subsets" of Unicode.

You'll also explain me how comes ASCII is a code subset of
ISO-8859-1 if ASCII is 8 bits. That should be interesting to hear
because on one side it is an accepted *fact* that ASCII is a code
subset of ISO-Latin-1. It is also an accepted *fact* that
ISO-Latin-1 is a 8 bits code. And, to the best of my knowledge,
it is also an accepted *fact* that no matter how strong the
reality distortion field you have, ISO-Latin-1 (ie ISO-8859-1)
is *not* a synomym for ASCII.

Before coming with logical phallacies I urge you to use a search
engine and look for topics on this issue.

I am right and you're plain wrong. My assertions are based on
facts, so you'll have a very hard time arguing with me on this topic.

> The fact that you distinguish
> between ascii from 1967 and the current definition of ascii is
> interresting only if you are using Bells teleprinter.

There's no such thing about "the current definition of ASCII".
You may be confusing ASCII (7 bit) with the much less common
"extended ASCII".  Extended ASCII is *definitely NOT* what most
people are referring to when they're referring to ASCII. While ASCII
is very common, extended ASCII is not. Most characters set are
ASCII supersets (ISO-8859-1 and Unicode to name two very
common ones). Most characters set are *not* "extended ASCII"
supersets.

That said, the fact that sadly some programmers, just like you,
think that there's such a thing as a "current definition of ascii" has
created numerous problems and incompatibilities in many
applications, countless misleading docs and continue to help the
spread of that misconception by filling blogs and Usenet's archives
with such blatantly wrong claims.

Now, will you persist on insisting that, your words:

"ASCII is 8 bit" ?

> I hate language lawyers:(

I hate people making blatanlty false claims, spreading
misconceptions, filling Usenet's archives with junk and, most
importantly, refusing to admit their errors in spite of
undeniable evidence.

:(
tom fredriksen - 21 Feb 2006 21:54 GMT
I do not need to respond to people behaving rudely, please learn some
manners and netiquette before talking online. This is a discussion group
not an abuse forum!

But, I will apologise for the "language lawyer" statement, I was
influenced by private matters which should not affect other people.

/tom

>> No, US-ASCII is 7 bit, ASCII is 8 bit.
>
[quoted text clipped - 68 lines]
>
> :(
Oliver Wong - 21 Feb 2006 22:13 GMT
   I'm not disagreeing with *most* of what you wrote; just two minor
nitpicks, and an open statement at the end.

> There's no such thing about "the current definition of ASCII".

   According to Wikipedia, http://en.wikipedia.org/wiki/Ascii

<quote>
The American Standards Association (ASA, later to become ANSI) first
published ASCII as a standard in 1963. ASCII-1963 lacked the lowercase
letters, and had an up-arrow instead of the caret and a left-arrow instead
of the underscore. The 1967 version added the lowercase letters, changed the
names of a few control characters and moved the two controls ACK and ESC
from the lowercase letters area into the control codes area.

ASCII was subsequently updated and published as ANSI X3.4-1968, ANSI
X3.4-1977, and finally, ANSI X3.4-1986
</quote>

   So while it may be pedantic, it would not be incorrect or meaningless to
ask, "Which version of ASCII do you mean?"

> While ASCII
> is very common, extended ASCII is not.

   I believe MS-DOS (I forget which versions) uses extended ASCII, so it
couldn't have been that uncommon (the MS-QBasic program, for example, made
heavy use of characters 176 to 218).

> Now, will you persist on insisting that, your words:
> "ASCII is 8 bit" ?

   The term "ASCII" in the sentence "ASCII is 8 bit" in this context might
refer to multiple things (even if we disregard all versions of ASCII prior
to the ANSI X3.4-1986 standard), one of which might be "The encoding Java
uses when we ask for the 'ASCII' encoding."

   Conceptually, we have a string in memory, and we wish to store that
string to disk, using a specific encoding. In our case, the 'ASCII'
encoding. Now when we say "Encoding FOO is n bits", what we usually mean is
either "the encoding uses n bits per character to represent a given string"
or the less restrictive "*on average*, the encoding uses n bits per
character to represent a given string". In this sense, UTF-16 can be said to
be "16 bits" even though certain characters take 32 bits to encode. It's
imprecise (arguably flat out wrong), but you "know what they mean" when they
say it.

   Now if we had an encoding which was said to be "7 bits", then the
encoding of a 16 character string should be 112 bits. An encoding which is
said to be "8 bits" would use 128 bits to encode that same 16 character
string.

   So when you encoding a 16 character string in Java using the "ASCII"
encoding, does it result in a bitstream of length 112 or 128? I would guess
it 128.

   I think one problem here is that ASCII conflates the concept of
numbering characters and encoding them. There's a clear dinstinction between
those concepts with Unicode and, say, UTF-8. Unicode merely assigns numbers
to each character, and UTF-8 assigns a mapping between numbers and
bitstreams.

   When ASCII is used as a character-numbering scheme, there are 128
character-number mappings, and ASCII is a "closed" system, where no new
characters can be added to it, so it might make sense to actually say that
this character-number mapping is inherently 7 bits (contrast this with
Unicode, where more characters may be added in the future, and so the system
does not inherently have a bit size).

   When ASCII is used as an encoding, to convert to bitstream, it seems
most implementations use 8 bits per character. So in that sense, it would
seem that "ASCII", the number-to-bitstream mapping system, is 8 bits.

   - Oliver
tom fredriksen - 21 Feb 2006 22:41 GMT
>> While ASCII
>> is very common, extended ASCII is not.
>
>    I believe MS-DOS (I forget which versions) uses extended ASCII, so it
> couldn't have been that uncommon (the MS-QBasic program, for example,
> made heavy use of characters 176 to 218).

MS-DOS since 1989 (I think 2.x or 3.x) has been using ASCII and code
paging (also known as extended ASCII) to support national characters in
at least europe. Where the codepage maps onto the last 128 values of the
byte.

The thing is, most widely used character encodings today use US-ASCII as
their foundation and then extends it with either 1, 9 or 25 bits.

/tom
John O'Conner - 22 Feb 2006 08:19 GMT
> ...
>> Strings in Java are unicode, so they are 16 bits wide,
[quoted text clipped - 3 lines]
> is unfortunate since since Unicode 3.1 this is not enough to
> represent all Unicode codepoints.

As of 1.5 (Tiger), Java supports the Unicode 4.0 standard. Also, several
classes, including String, have been updated to handle the fact that a
"character" can now be 1 or 2 char values. The char type now represents
a Unicode code unit in UTF-16. UTF-16 encodes Unicode code points
(0x0000 through 0x10FFFF) as one or two 16-bit code units.

For slightly more information regarding Strings and their length, you
can read my blog entry on this topic:
http://weblogs.java.net/blog/joconner/archive/2005/08/how_long_is_you.html

Regards,
John O'Conner
jeanlutrin@yahoo.fr - 22 Feb 2006 15:37 GMT
...
> > Strings in Java are Strings.  The primitive char type is based
> > on Unicode 3.0 and char in Java are hence 16 bits wide, which
[quoted text clipped - 10 lines]
> can read my blog entry on this topic:
> http://weblogs.java.net/blog/joconner/archive/2005/08/how_long_is_you.html

Thanks John,

I'm fully aware of that.  It's exactly because a character can now
need more than one char value that I wrote that it was unfortunate :)

See you soon on c.l.j.p.,

 Jean
Jeffrey Schwab - 21 Feb 2006 14:10 GMT
> RTMF!

The Mucking Faneuil?
Oliver Wong - 21 Feb 2006 14:33 GMT
>> RTMF!
>
> The Mucking Faneuil?

   Read The Manual, Friend!

   - Oliver
tom fredriksen - 21 Feb 2006 14:51 GMT
>>> RTMF!
>>
>> The Mucking Faneuil?
>
>    Read The Manual, Friend!

I havent heard of that one before, but its as good as the other one.

/tom
tom fredriksen - 21 Feb 2006 14:51 GMT
>> RTMF!
>
> The Mucking Faneuil?

Read The Manual First! (Its a bit more polite way of saying it.)
Roedy Green - 24 Feb 2006 14:12 GMT
>i doing some java work and basic i get a String "/n" and i need to get
>the ascii value of it which in this case is 10 what is the best way of
>going about this?

int w = '\n';  // note \n not /n

or http://mindprod.com/jgloss/ascii.html
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.