Java Forum / General / February 2006
string to ascii on line feed
donald - 20 Feb 2006 22:06 GMT Hi there,
i doing some java work and basic i get a String "/n" and i need to get the ascii value of it which in this case is 10 what is the best way of going about this?
Thanks
Donald
Oliver Wong - 20 Feb 2006 22:15 GMT > Hi there, > > i doing some java work and basic i get a String "/n" and i need to get > the ascii value of it which in this case is 10 what is the best way of > going about this? The ASCII code of a string is not well defined, but the ASCII code of a character is. First, convert your string to a single character (how you do this depends on what assumptions you can make with respect to the string). From there, in Java, you can cast a character to an integer, like this:
<code, not tested or compiled> char myChar = '\n'; int myInt = (int)myChar; </code, not tested or compiled>
- Oliver
donald - 20 Feb 2006 22:30 GMT i am trying to look at a string and determine whether it is a single character and, as such convert to an integer. I already understand how to convert to the integer (using byte rather than int is also better!) but it seems that escape characters are represented as two chars within a java string. i.e. "\n" has length two whereas i need it to find it by length one. a regular expression: Pattern.matches("\n|.", "\n") does not return true as I would expect.
Any ideas?
donald
Jeffrey Schwab - 20 Feb 2006 22:48 GMT > i am trying to look at a string and determine whether it is a single > character and, as such convert to an integer. I already understand how [quoted text clipped - 4 lines] > Pattern.matches("\n|.", "\n") > does not return true as I would expect. "\n" has length 1. Try printing the result of "\n".length().
You can access individual characters within a string using zero-based indexes; e.g., to get the third character in string s, use s[2].
Jeffrey Schwab - 20 Feb 2006 22:50 GMT > i am trying to look at a string and determine whether it is a single > character and, as such convert to an integer. I already understand how [quoted text clipped - 4 lines] > Pattern.matches("\n|.", "\n") > does not return true as I would expect. "\n" has length 1. Try printing the result of "\n".length().
Access individual characters within a string using zero-based indexes; e.g., to get the first character in string s, use s[0].
comp.lang.java.help is probably a better place for this kind of question.
Oliver Wong - 21 Feb 2006 14:32 GMT >i am trying to look at a string and determine whether it is a single > character and, as such convert to an integer. I already understand how [quoted text clipped - 6 lines] > > Any ideas? First, read http://groups.google.ca/group/comp.lang.java.programmer/msg/3fd11f7fb586e837
Now after having read that, do you mean an in-memory string of length 2 of which the first character is the slash, and the second character is an 'n', or do you mean an in-memory string of length 1 of which the only character is the newline character?
- Oliver
tom fredriksen - 21 Feb 2006 00:02 GMT > Hi there, > > i doing some java work and basic i get a String "/n" and i need to get > the ascii value of it which in this case is 10 what is the best way of > going about this? RTMF!
Strings in Java are unicode, so they are 16 bits wide, ascii is 8 bit. Meaning you have to use Strings class methods to retrieve the individual character values correctly by looping over the string converting each character to an integer.
/tom
jeanlutrin@yahoo.fr - 21 Feb 2006 03:08 GMT ...
> Strings in Java are unicode, so they are 16 bits wide, Strings in Java are Strings. The primitive char type is based on Unicode 3.0 and char in Java are hence 16 bits wide, which is unfortunate since since Unicode 3.1 this is not enough to represent all Unicode codepoints.
> ascii is 8 bit. No.
ASCII is a seven-bit code.
Chris Uppal - 21 Feb 2006 09:14 GMT > Strings in Java are Strings. The primitive char type is based > on Unicode 3.0 and char in Java are hence 16 bits wide, which > is unfortunate since since Unicode 3.1 Small correction (just for historical interest): the Unicode standard abandoned 16-bitness no later than v 2.0.0 published in July '96.
-- chris
tom fredriksen - 21 Feb 2006 14:50 GMT > .... >> Strings in Java are unicode, so they are 16 bits wide, [quoted text clipped - 3 lines] > is unfortunate since since Unicode 3.1 this is not enough to > represent all Unicode codepoints. Not really the point is it.
Java string are based on Unicode which in java is based on UTF-16, so strings in java are 16 bit wide. The fact that the underlying primitive type is char which is based on UTF-16 is irrelevant for this discussion.
>> ascii is 8 bit. > > No. > > ASCII is a seven-bit code. No, US-ASCII is 7 bit, ASCII is 8 bit. The fact that you distinguish between ascii from 1967 and the current definition of ascii is interresting only if you are using Bells teleprinter.
I hate language lawyers:(
/tom
jeanlutrin@yahoo.fr - 21 Feb 2006 21:19 GMT > No, US-ASCII is 7 bit, ASCII is 8 bit. That is plain wrong. Even more scarier: so many semi-knowledgeable programmer get this wrong that it *is* definitely a very common source of endless bugs and misconceptions.
ASCII is 7 bit. Get over with it.
Now, you find me a string that gives me a byte over 127 by using the following method, will you?
final String s = ...; final byte[] missionImpossible = s.getBytes("ASCII");
Hint: "US-ASCII" and "ASCII" is the same for Sun, as it is for anybody familiar with this concept. And ASCII *fscking* is 7 bit. Which is why you will *not* give me an "ASCII byte" above 127. There's no such thing. How do you want me to explain it ?
Now it is *you* who should "RTMF!" (sic) as you wrote it in your first (false) post.
You'll also be nice and explain how comes both ASCII and ISO-8859-1 are what is called "code subsets" of Unicode.
You'll also explain me how comes ASCII is a code subset of ISO-8859-1 if ASCII is 8 bits. That should be interesting to hear because on one side it is an accepted *fact* that ASCII is a code subset of ISO-Latin-1. It is also an accepted *fact* that ISO-Latin-1 is a 8 bits code. And, to the best of my knowledge, it is also an accepted *fact* that no matter how strong the reality distortion field you have, ISO-Latin-1 (ie ISO-8859-1) is *not* a synomym for ASCII.
Before coming with logical phallacies I urge you to use a search engine and look for topics on this issue.
I am right and you're plain wrong. My assertions are based on facts, so you'll have a very hard time arguing with me on this topic.
> The fact that you distinguish > between ascii from 1967 and the current definition of ascii is > interresting only if you are using Bells teleprinter. There's no such thing about "the current definition of ASCII". You may be confusing ASCII (7 bit) with the much less common "extended ASCII". Extended ASCII is *definitely NOT* what most people are referring to when they're referring to ASCII. While ASCII is very common, extended ASCII is not. Most characters set are ASCII supersets (ISO-8859-1 and Unicode to name two very common ones). Most characters set are *not* "extended ASCII" supersets.
That said, the fact that sadly some programmers, just like you, think that there's such a thing as a "current definition of ascii" has created numerous problems and incompatibilities in many applications, countless misleading docs and continue to help the spread of that misconception by filling blogs and Usenet's archives with such blatantly wrong claims.
Now, will you persist on insisting that, your words:
"ASCII is 8 bit" ?
> I hate language lawyers:( I hate people making blatanlty false claims, spreading misconceptions, filling Usenet's archives with junk and, most importantly, refusing to admit their errors in spite of undeniable evidence.
:( tom fredriksen - 21 Feb 2006 21:54 GMT I do not need to respond to people behaving rudely, please learn some manners and netiquette before talking online. This is a discussion group not an abuse forum!
But, I will apologise for the "language lawyer" statement, I was influenced by private matters which should not affect other people.
/tom
>> No, US-ASCII is 7 bit, ASCII is 8 bit. > [quoted text clipped - 68 lines] > > :( Oliver Wong - 21 Feb 2006 22:13 GMT I'm not disagreeing with *most* of what you wrote; just two minor nitpicks, and an open statement at the end.
> There's no such thing about "the current definition of ASCII". According to Wikipedia, http://en.wikipedia.org/wiki/Ascii
<quote> The American Standards Association (ASA, later to become ANSI) first published ASCII as a standard in 1963. ASCII-1963 lacked the lowercase letters, and had an up-arrow instead of the caret and a left-arrow instead of the underscore. The 1967 version added the lowercase letters, changed the names of a few control characters and moved the two controls ACK and ESC from the lowercase letters area into the control codes area.
ASCII was subsequently updated and published as ANSI X3.4-1968, ANSI X3.4-1977, and finally, ANSI X3.4-1986 </quote>
So while it may be pedantic, it would not be incorrect or meaningless to ask, "Which version of ASCII do you mean?"
> While ASCII > is very common, extended ASCII is not. I believe MS-DOS (I forget which versions) uses extended ASCII, so it couldn't have been that uncommon (the MS-QBasic program, for example, made heavy use of characters 176 to 218).
> Now, will you persist on insisting that, your words: > "ASCII is 8 bit" ? The term "ASCII" in the sentence "ASCII is 8 bit" in this context might refer to multiple things (even if we disregard all versions of ASCII prior to the ANSI X3.4-1986 standard), one of which might be "The encoding Java uses when we ask for the 'ASCII' encoding."
Conceptually, we have a string in memory, and we wish to store that string to disk, using a specific encoding. In our case, the 'ASCII' encoding. Now when we say "Encoding FOO is n bits", what we usually mean is either "the encoding uses n bits per character to represent a given string" or the less restrictive "*on average*, the encoding uses n bits per character to represent a given string". In this sense, UTF-16 can be said to be "16 bits" even though certain characters take 32 bits to encode. It's imprecise (arguably flat out wrong), but you "know what they mean" when they say it.
Now if we had an encoding which was said to be "7 bits", then the encoding of a 16 character string should be 112 bits. An encoding which is said to be "8 bits" would use 128 bits to encode that same 16 character string.
So when you encoding a 16 character string in Java using the "ASCII" encoding, does it result in a bitstream of length 112 or 128? I would guess it 128.
I think one problem here is that ASCII conflates the concept of numbering characters and encoding them. There's a clear dinstinction between those concepts with Unicode and, say, UTF-8. Unicode merely assigns numbers to each character, and UTF-8 assigns a mapping between numbers and bitstreams.
When ASCII is used as a character-numbering scheme, there are 128 character-number mappings, and ASCII is a "closed" system, where no new characters can be added to it, so it might make sense to actually say that this character-number mapping is inherently 7 bits (contrast this with Unicode, where more characters may be added in the future, and so the system does not inherently have a bit size).
When ASCII is used as an encoding, to convert to bitstream, it seems most implementations use 8 bits per character. So in that sense, it would seem that "ASCII", the number-to-bitstream mapping system, is 8 bits.
- Oliver
tom fredriksen - 21 Feb 2006 22:41 GMT >> While ASCII >> is very common, extended ASCII is not. > > I believe MS-DOS (I forget which versions) uses extended ASCII, so it > couldn't have been that uncommon (the MS-QBasic program, for example, > made heavy use of characters 176 to 218). MS-DOS since 1989 (I think 2.x or 3.x) has been using ASCII and code paging (also known as extended ASCII) to support national characters in at least europe. Where the codepage maps onto the last 128 values of the byte.
The thing is, most widely used character encodings today use US-ASCII as their foundation and then extends it with either 1, 9 or 25 bits.
/tom
John O'Conner - 22 Feb 2006 08:19 GMT > ... >> Strings in Java are unicode, so they are 16 bits wide, [quoted text clipped - 3 lines] > is unfortunate since since Unicode 3.1 this is not enough to > represent all Unicode codepoints. As of 1.5 (Tiger), Java supports the Unicode 4.0 standard. Also, several classes, including String, have been updated to handle the fact that a "character" can now be 1 or 2 char values. The char type now represents a Unicode code unit in UTF-16. UTF-16 encodes Unicode code points (0x0000 through 0x10FFFF) as one or two 16-bit code units.
For slightly more information regarding Strings and their length, you can read my blog entry on this topic: http://weblogs.java.net/blog/joconner/archive/2005/08/how_long_is_you.html
Regards, John O'Conner
jeanlutrin@yahoo.fr - 22 Feb 2006 15:37 GMT ...
> > Strings in Java are Strings. The primitive char type is based > > on Unicode 3.0 and char in Java are hence 16 bits wide, which [quoted text clipped - 10 lines] > can read my blog entry on this topic: > http://weblogs.java.net/blog/joconner/archive/2005/08/how_long_is_you.html Thanks John,
I'm fully aware of that. It's exactly because a character can now need more than one char value that I wrote that it was unfortunate :)
See you soon on c.l.j.p.,
Jean
Jeffrey Schwab - 21 Feb 2006 14:10 GMT > RTMF! The Mucking Faneuil?
Oliver Wong - 21 Feb 2006 14:33 GMT >> RTMF! > > The Mucking Faneuil? Read The Manual, Friend!
- Oliver
tom fredriksen - 21 Feb 2006 14:51 GMT >>> RTMF! >> >> The Mucking Faneuil? > > Read The Manual, Friend! I havent heard of that one before, but its as good as the other one.
/tom
tom fredriksen - 21 Feb 2006 14:51 GMT >> RTMF! > > The Mucking Faneuil? Read The Manual First! (Its a bit more polite way of saying it.)
Roedy Green - 24 Feb 2006 14:12 GMT >i doing some java work and basic i get a String "/n" and i need to get >the ascii value of it which in this case is 10 what is the best way of >going about this? int w = '\n'; // note \n not /n
or http://mindprod.com/jgloss/ascii.html
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|