Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / April 2006

Tip: Looking for answers? Try searching our database.

Parse a text file with quoted delimiters?

Thread view: 
flarosa - 11 Apr 2006 18:32 GMT
Hi,

Is there an easy way to parse a line of text which may contain quoted
instances of the delimiting character?

For example,
1200,Bob's Ties,400 Atwood Avenue
1201,"Mary, Jane and Associates",250 Washington St.

In the 2nd line, I'd want the whole string "Mary, Jane and Associates"
to parse as one token.

The two simple ways I know of parsing text in java - StringTokenizer
and String.split() - would end up parsing the 2nd line into four tokens
instead of 3.

Thanks,
Frank
Ben - 11 Apr 2006 18:35 GMT
> Hi,
>
[quoted text clipped - 14 lines]
> Thanks,
> Frank

Look at the regex API it has everything you need.
Roedy Green - 11 Apr 2006 19:01 GMT
>In the 2nd line, I'd want the whole string "Mary, Jane and Associates"
>to parse as one token.

see http://mindprod.com/jgloss/csv.html
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Thomas Fritsch - 11 Apr 2006 19:17 GMT
> Is there an easy way to parse a line of text which may contain quoted
> instances of the delimiting character?
[quoted text clipped - 9 lines]
> and String.split() - would end up parsing the 2nd line into four tokens
> instead of 3.
See <http://java.sun.com/j2se/1.4.2/docs/api/java/io/StreamTokenizer.html>.
With that you can do things like:

 String line = ...;
 StreamTokenizer tok = new StreamTokenizer(new StringReader(line));
 tok.resetSyntax();
 tok.wordChars('\u0000','\uFFFF');
 tok.whitespaceChars(',', ',');
 tok.quoteChar('\"');
 while (tok.nextToken() != StreamTokenizer.TT_EOF) {
   String word = tok.sval;
   System.out.println(word);
 }
Signature

"Thomas:Fritsch$ops.de".replace(':', '.').replace('$', '@')

flarosa - 11 Apr 2006 23:17 GMT
That worked well. Thanks.
Oliver Wong - 11 Apr 2006 19:42 GMT
> Hi,
>
[quoted text clipped - 11 lines]
> and String.split() - would end up parsing the 2nd line into four tokens
> instead of 3.

   It looks like you're dealing with CSV (Comma Seperated Value) files. If
so, you might want to look for a CSV parsing library rather than reinventing
the wheel by trying to implement your own version.

   - Oliver
Jubz - 11 Apr 2006 20:58 GMT
Yes, indeed.
Try Ostermiller's utilities. I've used their CSV library, and it works
great.
http://ostermiller.org/utils/.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.