Hi,
I want to tokenize a string like '/Device/Interface?NAME=Serial1/0' into
the following tokens.
Device
Interface?NAME=Serial1/0
If I use StringTokenizer with '/' as sepearator character I get the
following, which is not what I want.
Device
Interface?NAME=Serial1
0
I tried using StreamTokenizer.
StreamTokenizer st = new StreamTokenizer(
"/Device/Interface?NAME=Serial1/0" );
st.whiteSpaceChars( '/', '/' );
st.nextToken()
The result is
Device
INTERFACE
null
NAME
null
Serial1
null
I tried to call resetSyntax
st.resetSyntax();
st.wordChars(0, 255);
st.whitespaceChars( '/', '/');
st.quoteChar('"');
st.quoteChar('\'');
st.parseNumbers();
The result I got was
Device
INTERFACE?NAME=Serial1
null
I tried quoting 'Serial1/0' like /Device/Interface?NAME='Serial1/0' The
result I got was
Device
INTERFACE?NAME=
Serial1/0
Is there any way with StringTokenizer, StreamTokenizer or any other
means (without acually having to write a tokenizer on my own) to get the
result I want which is
Device
Interface?NAME=Serial1/0
Thanks
Anand
Virgil Green - 26 Apr 2005 20:07 GMT
> Hi,
>
[quoted text clipped - 59 lines]
> Thanks
> Anand
Not without defining rules regarding when a '\' should be treated as a token
and when it should be treated as an included character.
--
Virgil
Anand Narasimhan - 26 Apr 2005 20:17 GMT
Thanks.
Setting whitespaceChars to '/' seems to work, except that when the
tokenizer sees a quote character, tokenizes everything within the quotes
as a seperate token.
eg. /Device/Interface?NAME='Serial1/0' results in
Device
Interface?NAME=
Serial1/0
But I did not set the quote character as a whitespace character.
Anand
>>Hi,
>>
[quoted text clipped - 65 lines]
> --
> Virgil
Virgil Green - 28 Apr 2005 18:20 GMT
> Thanks.
> Setting whitespaceChars to '/' seems to work, except that when the
[quoted text clipped - 9 lines]
>
> Anand
Still, no rules. What are the rules for when a '/' is considered a separator
and when it is considered a valid character?

Signature
Virgil
Oscar kind - 26 Apr 2005 21:40 GMT
> I want to tokenize a string like '/Device/Interface?NAME=Serial1/0' into
> the following tokens.
[quoted text clipped - 8 lines]
> Interface?NAME=Serial1
> 0
[...]
> Is there any way with StringTokenizer, StreamTokenizer or any other
> means (without acually having to write a tokenizer on my own) to get the
> result I want which is
>
> Device
> Interface?NAME=Serial1/0
How is "/Device/Interface?NAME=Serial1/0".split("/", 3) insufficient?
I get {"", "Device", "Interface?NAME=Serial1"}, which is not exactly what
you want, but quite close.

Signature
Oscar Kind http://home.hccnet.nl/okind/
Software Developer for contact information, see website
PGP Key fingerprint: 91F3 6C72 F465 5E98 C246 61D9 2C32 8E24 097B B4E2
Tor Iver Wilhelmsen - 27 Apr 2005 14:01 GMT
> I want to tokenize a string like '/Device/Interface?NAME=Serial1/0'
> into the following tokens.
[quoted text clipped - 8 lines]
> Interface?NAME=Serial1
> 0
You want to look into using regular expressions instead (present in
1.4 or later, separate install prior to that).
E.g.
Pattern p = Pattern.compile("/(\w+)/(.*)");
Matcher m = p.matcher("/Device/Interface?NAME=Serial1/0");
if (m.matches()) {
tokens = new String[] { m.group(1), m.group(2)};
}
> I tried using StreamTokenizer.
StreamTokenizer is a very basic C lexer. It, like StringTokenizer,
should be discarded in modern code in preference of regular
expressions or a lexer/parser (google for them, there are quite a few
variants).
Ross Bamford - 29 Apr 2005 14:07 GMT
> > I want to tokenize a string like '/Device/Interface?NAME=Serial1/0'
> > into the following tokens.
[quoted text clipped - 26 lines]
> expressions or a lexer/parser (google for them, there are quite a few
> variants).
Although I'd stick with the tokenizer for simple use cases or tight
code, like splitting into words - regexps are more expensive.
Ross

Signature
[Ross A. Bamford] [ross AT the.website.domain]
Roscopeco Open Tech ++ Open Source + Java + Apache + CMF
http://www.roscopec0.f9.co.uk/ + info@the.website.domain