Hello everybody & thanks for reading this.
I am in the process of decoding a XML-dump of a big database with a
lot
of characters and symbols. I have done a lot of rules to prase away
all noise and what keeps me annoying at the end is this character: ^Q.
If i open the result file in notepad, i see it is represented by white
space. Fine i thought. But it really is not white space. When i look
at it at the console (i work with Cygwin i see it is ^Q .
I tried to print out the ASCII table to get its digit but is is not
included there. So the question is how to get rid of this (control?)
character/String. While
writing this it crossed my mind that i can do a:
readLine and String.contains("^Q") which i do after typing this
message. But the question is really what is this ^Q anyway.
thanks for your time/thoughts/suggestions,
kave
Lew - 07 Jun 2007 13:46 GMT
> what keeps me annoying at the end is this character: ^Q.
Control-Q
> If i open the result file in notepad, i see it is represented by white
> space. Fine i thought. But it really is not white space.
Because it's a control character doesn't necessarily mean it's whitespace.
> When i look at it at the console (i work with Cygwin i see it is ^Q .
> I tried to print out the ASCII table to get its digit but is is not
Of cousrse it is. Control-A is 1, Control-B is 2, Control-C is 3, ...
> included there. So the question is how to get rid of this (control?)
> character/String. While
That depends on how you're processing the data. For example, with a String
you could replace() characters, or if you're streaming through the characters
you could toss the unwanted ones and copy the rest.
> writing this it crossed my mind that i can do a:
> readLine and String.contains("^Q") which i do after typing this
> message. But the question is really what is this ^Q anyway.
Control-Q.
The use of the caret to indicate "Control-whatever" is standard notation,
particularly for consoles.

Signature
Lew
Oliver Wong - 08 Jun 2007 15:39 GMT
> Hello everybody & thanks for reading this.
>
[quoted text clipped - 12 lines]
> readLine and String.contains("^Q") which i do after typing this
> message. But the question is really what is this ^Q anyway.
What I'd usually do in a situation like this is dump the data to a
file (which it sounds like you've done already, since you said you opened
something in notepad), and then open it with a hex editor to see the exact
sequence of bytes you're getting.
From there, the rest depends a lot on your code. Typically, when
you're doing XML parsing in Java, you're not working with bytes, but with
characters. Since you mention String.contains(), I assume you are working
with characters, and not bytes. There's a "translation" process occurring
at some point from bytes to characters, but exactly which byte-sequence
maps onto which character sequence depends on the encoding you select.
- Oliver
Thomas Fritsch - 08 Jun 2007 15:59 GMT
kvram@passagen.se schrieb:
> I am in the process of decoding a XML-dump of a big database with a
> lot
[quoted text clipped - 10 lines]
> readLine and String.contains("^Q") which i do after typing this
> message. But the question is really what is this ^Q anyway.
See <http://en.wikipedia.org/wiki/ASCII>
Quoted from there:
"The use of Control-S (XOFF, an abbreviation for "transmit off") as a
handshaking signal warning a sender to stop transmission because of
impending overflow, and Control-Q (XON, "transmit on") to resume
sending, persists to this day in many systems as a manual output control
technique."

Signature
Thomas