Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / June 2006

Tip: Looking for answers? Try searching our database.

regular expressions -- how to ignore whitespace

Thread view: 
cmills28@yahoo.com - 27 Jun 2006 00:56 GMT
I need to extract some data from a web page that I'm parsing, using
regular expressions to find matches I'm interested in.  In particular,
I'm trying to extract the text from between <A> and </A>.  The <A> does
have several attributes which I don't care about.  The reg. exp. I'm
trying to use is:

<a class=((.)+?) id=((.)+?)</a>

I'm after anchors that have a class & id attributes as the 1st ones.
This pattern works for most of the page I'm parsing, but it hangs when
there's extra spaces in the tag, for instance: if the html starts like:

<a       class=t id= .....

or

<a class=t      id= .....

the pattern does NOT match since the extra spaces between things, but
in fact it's an anchor that I do want to extract.  How can I ignore
those extra spaces?

Thanks.
Chris Smith - 27 Jun 2006 01:48 GMT
> <a class=t      id= .....
>
> the pattern does NOT match since the extra spaces between things, but
> in fact it's an anchor that I do want to extract.  How can I ignore
> those extra spaces?

Use \s+ (if in a string literal, \\s+) instead of the space in the
regular expresion.

Signature

Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

cmills28@yahoo.com - 27 Jun 2006 02:26 GMT
> > <a class=t      id= .....
> >
[quoted text clipped - 8 lines]
> Chris Smith - Lead Software Developer / Technical Trainer
> MindIQ Corporation

Thanks Chris!!  That did it!
i30817@gmail.com - 27 Jun 2006 11:33 GMT
If you want to ignore \n you can do something like this:
<tag>(([^<]*\n)*[^<]*)</tag>
and use the $1 captured group for whatever you want. I think its
correct. You can test regular expression quickly in JEdit


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.