I wrote a class that would read a pipe delimited text file using
StringTokenizer, however, it seems that if a record is not populated with at
least a space or null, the ST disregards it. I'm still new at this and was
wondering if I was doing something wrong.
ie.. 1|2|3||5 will yield:
1
2
3
5
If I'm not, how can I overcome this limitation. I'm reading a file via
BufferedReader/FileReader, and parsing it accordingly, but the above is not
the end result I was expecting.
Please help.
TIA
parv - 25 Mar 2004 05:06 GMT
> StringTokenizer... it seems that if a record is not populated with
> at least a space or null, the ST disregards it.
...
> ie.. 1|2|3||5 will yield:
> 1
> 2
> 3
> 5
...
> how can I overcome this limitation. I'm reading a file via
> BufferedReader/FileReader
You were expecting 1, 2, 3, , 5 instead, right?
It seems that StringTokenizer class is working this way ...
result, initially empty
while ( stream is not exhausted )
{
append current character to result
while ( current character is not delimiter )
move (the pointer) to character after delimiter
now that the current character is delimiter
}
result at this point has the parsed/tokenized stream
You could (easily) use LinkedList & (List)Iterator classes to build
your own version of tokenized stream w/ the desired behaviour. The
only thing that would require more testing & planning than anything in
the project is dealing w/ the delimiter.
The change for your desired behaviour will be just before the end of
the main loop...
result, initially empty
while ( stream is not exhausted )
{
...
// CHANGE
append empty place holder to result
if current character is again same as delimiter
}
result at this point has the parsed/tokenized stream
...Above presentation is quite simplistic; it will fail miserably on
escaped delimiters. It does not handle multiple delimiters or delimiters
of length greater than 1. (Personally, i would implement the 2d part
before i handle multiple delimiters.)
Oh, do not forget to look up StreamTokenizer too, mentioned in
StreamTokenizer's API reference (JDK 1.3).
- parv

Signature
As nice it is to receive personal mail, too much sweetness causes
tooth decay. Unless you have burning desire to contact me, do not do
away w/ WhereElse in the address for private communication.
Stephan Wehner - 25 Mar 2004 23:28 GMT
There is a constructor for StringTokenizer with a boolean argument
called "returnDelims". You might use that with returnDelims=true.
See the following code and output below:
public static void main(String[] args) {
String s = "1,2,,3,,,4";
StringTokenizer st = new StringTokenizer(s, ",", true);
while (st.hasMoreElements())
{
System.out.println("next token: " + st.nextToken());
}
}
next token: 1
next token: ,
next token: 2
next token: ,
next token: ,
next token: 3
next token: ,
next token: ,
next token: ,
next token: 4
Stephan
__________________________________________
Stephan Wehner
Editor, Traffic Life: Passionate Tales and Exit Strategies
An anthology about our car culture and alternatives with
short stories, poems, cartoons and lots of other art
www.trafficlife.com
> I wrote a class that would read a pipe delimited text file using
> StringTokenizer, however, it seems that if a record is not populated with at
[quoted text clipped - 12 lines]
> Please help.
> TIA
Scaramouche - 26 Mar 2004 04:58 GMT
Thank you both for your attempt to help out.
The file I'm reading in contains the following: 1|2|3||5|6
With the returnDelims set to 'true' i get the following output:
1
2
3
5
6
Not exactly what I needed but interesting output nevertheless. Like I said,
I'm kind of new to this, I'll have to figure a work-around that's not too
'out there' for my current skills.
I was expecting this output:
1
2
3
5
6
Thanks again for all your help.
> There is a constructor for StringTokenizer with a boolean argument
> called "returnDelims". You might use that with returnDelims=true.
[quoted text clipped - 55 lines]
> > Please help.
> > TIA
Peter Kronenberg - 26 Mar 2004 13:27 GMT
Get the version of StringTokenizer at http://ostermiller.org/utils/.
It does exactly what you want, plus more. I've been using it for
years.
Peter
"Scaramouche" <spamSucks@forgetIt.com> wrotd:
>Thank you both for your attempt to help out.
>The file I'm reading in contains the following: 1|2|3||5|6
[quoted text clipped - 85 lines]
>> > Please help.
>> > TIA
Stephen Ostermiller - 26 Mar 2004 12:02 GMT
There are three solutions to this problem:
1) Use a replacement StringTokenizer that allows empty tokens to be returned:
http://ostermiller.org/utils/StringTokenizer.html
2) Use the regular expression String.Split method that comes with Java 1.4 and up:
String[] tokens = "1|2|3||5".split("\\|");
3) Use a third party split library like the one here:
http://ostermiller.org/utils/StringHelper.java:
String[] tokens = StringHelper.split("1|2|3||5","|");
Stephen
parv - 27 Mar 2004 01:49 GMT
> 3) Use a third party split library like the one here:
> http://ostermiller.org/utils/StringHelper.java:
That results in 404; backtracking, i get...
http://ostermiller.org/utils/StringHelper.html
http://ostermiller.org/utils/StringHelper.java.html
- parv

Signature
As nice it is to receive personal mail, too much sweetness causes
tooth decay. Unless you have burning desire to contact me, do not do
away w/ WhereElse in the address for private communication.