Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / June 2007

Tip: Looking for answers? Try searching our database.

parsing away a character

Thread view: 
kvram@passagen.se - 07 Jun 2007 10:57 GMT
Hello everybody & thanks for reading this.

I am in the process of decoding a XML-dump of a big database with a
lot
of characters and symbols. I have done a lot of rules to prase away
all noise and what keeps me annoying at the end is this character: ^Q.

If i open the result file in notepad, i see it is represented by white
space. Fine i thought. But it really is not white space. When i look
at it at the console (i work with Cygwin i see it is ^Q .
I tried to print out the ASCII table to get its digit but is is not
included there. So the question is how to get rid of this (control?)
character/String. While
writing this it crossed my mind that i can do a:
readLine and String.contains("^Q") which i do after typing this
message. But the question is really what is this ^Q anyway.

thanks for your time/thoughts/suggestions,
kave
Lew - 07 Jun 2007 13:46 GMT
> what keeps me annoying at the end is this character: ^Q.

Control-Q

> If i open the result file in notepad, i see it is represented by white
> space. Fine i thought. But it really is not white space.

Because it's a control character doesn't necessarily mean it's whitespace.

> When i look at it at the console (i work with Cygwin i see it is ^Q .
> I tried to print out the ASCII table to get its digit but is is not

Of cousrse it is.  Control-A is 1, Control-B is 2, Control-C is 3, ...

> included there. So the question is how to get rid of this (control?)
> character/String. While

That depends on how you're processing the data.  For example, with a String
you could replace() characters, or if you're streaming through the characters
you could toss the unwanted ones and copy the rest.

> writing this it crossed my mind that i can do a:
> readLine and String.contains("^Q") which i do after typing this
> message. But the question is really what is this ^Q anyway.

Control-Q.

The use of the caret to indicate "Control-whatever" is standard notation,
particularly for consoles.

Signature

Lew

Oliver Wong - 08 Jun 2007 15:39 GMT
> Hello everybody & thanks for reading this.
>
[quoted text clipped - 12 lines]
> readLine and String.contains("^Q") which i do after typing this
> message. But the question is really what is this ^Q anyway.

   What I'd usually do in a situation like this is dump the data to a
file (which it sounds like you've done already, since you said you opened
something in notepad), and then open it with a hex editor to see the exact
sequence of bytes you're getting.

   From there, the rest depends a lot on your code. Typically, when
you're doing XML parsing in Java, you're not working with bytes, but with
characters. Since you mention String.contains(), I assume you are working
with characters, and not bytes. There's a "translation" process occurring
at some point from bytes to characters, but exactly which byte-sequence
maps onto which character sequence depends on the encoding you select.

   - Oliver
Thomas Fritsch - 08 Jun 2007 15:59 GMT
kvram@passagen.se schrieb:
> I am in the process of decoding a XML-dump of a big database with a
> lot
[quoted text clipped - 10 lines]
> readLine and String.contains("^Q") which i do after typing this
> message. But the question is really what is this ^Q anyway.
See <http://en.wikipedia.org/wiki/ASCII>
Quoted from there:
"The use of Control-S (XOFF, an abbreviation for "transmit off") as a
handshaking signal warning a sender to stop transmission because of
impending overflow, and Control-Q (XON, "transmit on") to resume
sending, persists to this day in many systems as a manual output control
technique."

Signature

Thomas



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.