Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2006

Tip: Looking for answers? Try searching our database.

Regex pattern problem

Thread view: 
Ted Hopp - 13 Nov 2006 04:06 GMT
I was writing a quick-and-dirty regex to search html text and pull out the
source url from IMG tags. I first tried:

Pattern p = Pattern.compile("<img (?:[^>]* )?src=\"([^\"]*)\"");

(I know that this pattern makes all kinds of unwarranted assumptions about
the html, but that's another topic.) The problem I was having was that
although this pattern matches, it only results in one capture group--group
0. I was expecting the parens after src= to give me the url in capture group
1, but no such luck. It's only when I double the parens:

Pattern p = Pattern.compile("<img (?:[^>]* )?src=\"(([^\"]*))\"");

that the src value is captured.

So my question is: why do I need to double the parens?

Thanks,

Ted Hopp
Jussi Piitulainen - 13 Nov 2006 11:10 GMT
> I was writing a quick-and-dirty regex to search html text and
> pull out the source url from IMG tags. I first tried:
[quoted text clipped - 14 lines]
>
> So my question is: why do I need to double the parens?

You don't need to double the parens. You need to provide a
short program that demonstrates the problem. The following is
longer than needed, but it fails to fail in the way that you
describe: it has single parens in the pattern, accesses group
1, and prints here.be.it/1 and here.be.it/2 as expected:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Roska {
   public static void main(String [] args) {
       String t1 = "left <img stuff src=\"here.be.it/1\" etc.>";
       String t2 = " then left <img src=\"here.be.it/2\" etc.>";
       Pattern p = Pattern
           .compile("<img (?:[^>]* )?src=\"([^\"]*)\"");
       Matcher m = p.matcher(t1 + t2);
       while (m.find()) {
           System.out.println(m.group(1));
       }
   }
}


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.