Hi,
I want to create a character parser in java. I basically want to parse
a text file removing extra spaces and carriage returns. Ive used
stream tokenizers before, but what if i want the token to be every
character rather than a delimiter.
Thanks for your time and help
:)
Daniel Pitts - 16 Feb 2007 00:09 GMT
> Hi,
>
[quoted text clipped - 5 lines]
> Thanks for your time and help
> :)
In that case, you don't want tokenizing.
You don't even want parsing!
You want to read the data one character at a time.
<http://java.sun.com/j2se/1.5.0/docs/api/java/io/Reader.html>
Look at the method called read(char[])
Alex Hunsley - 18 Feb 2007 10:13 GMT
>> Hi,
>>
[quoted text clipped - 13 lines]
>
> Look at the method called read(char[])
For efficiency, I suggest using BufferedReader, which is the same deal
(but it buffers chunks of data behind the scenes - less disk accesses,
so faster!)
lex
richliu2005@gmail.com - 17 Feb 2007 17:03 GMT
For best performance, you may want to use a java.nio.ByteBuffer. I've
had to read in a 2GB file and using a a BufferedInputStream and a
ByteBuffer was the only viable solution. Other APIs could not handle
such a large file.
If your file is small(using a BufferedInputStream/ByteBuffer would not
offer significant gains) and simplicity outweighs performance, then
you can always use one of the replace methods in the String class.
> Hi,
>
[quoted text clipped - 5 lines]
> Thanks for your time and help
> :)
Alex Hunsley - 18 Feb 2007 10:20 GMT
> For best performance, you may want to use a java.nio.ByteBuffer. I've
> had to read in a 2GB file and using a a BufferedInputStream and a
> ByteBuffer was the only viable solution. Other APIs could not handle
> such a large file.
Which other APIs do you mean?
Shouldn't the OP should be using a Reader or BufferedReader (designed
for char data) rather than something that reads bytes?
The end effect may be the same, of course...
lex
Boaz.Jan@gmail.com - 18 Feb 2007 11:23 GMT
> richliu2...@gmail.com wrote:
> > For best performance, you may want to use a java.nio.ByteBuffer. I've
[quoted text clipped - 8 lines]
>
> lex
i had a similar task to do some time ago
i needed to compare lexographcily two enormous files simultaneously.
you can use a CharArrayReader to read an char[] (you might wanna make
an additional method for reading a complete line instead of a portion
of the text)
now you can break all the text file to chars
if you do need buffering i recommend you learn the sourcecode behind
BufferedReader and make your own Reader class that can return a char[]
(i couldnt find one in jse api ... i havnt invested alot of time on
it)
for holding your already parsed text you can crate a StringBuffer and
simply by iterating the char[] you decide if you want to append the
givan char to the StringBuffer or not
http://java.sun.com/j2se/1.5.0/docs/api/java/io/CharArrayReader.html
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuffer.html
a more faster mutable sequence of characters for non-sync tasks (just
like StringBuffer but faster)
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/StringBuilder.html
and maybe you can find some thing here
http://java.sun.com/docs/books/tutorial/essential/io/scanning.html