
Signature
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
> > I understand why this matches the entire text (because of the outer .*'s),
> > however I just want to match the text inside the td tags. I tried:
[quoted text clipped - 4 lines]
> it's much more possible to help you. How about posting the code where
> you define a group with parens and try to retrieve that group's value?
I expected ".*<td>\\(.*\\)</td>.*" to work, matching the text inside the
group, which is surrounded by td tags, and I called m.group() to obtain a
matching group. m.groupCount() returns 1 (the whole thing only is matched).
I also tried: ".*<td>(.*)</td>.*" which matches everything, with
m.groupCount() returning 1 again. Yet shouldn't it be matching 2 parts: The
whole and the group?
StringBuffer page = new StringBuffer("<tr>\n <td>blah</td>\n </tr>"); //
actually, has whole HTML page
Pattern p = Pattern.compile(".*<td>(.*)</td>.*", Pattern.DOTALL);
Matcher m = p.matcher(page.toString());
System.out.println("Match finder: " + m.matches());
System.out.println("Match groups: " + m.groupCount());
Cheers,
Mike
Collin VanDyck - 27 Feb 2004 19:35 GMT
> I expected ".*<td>\\(.*\\)</td>.*" to work, matching the text inside the
> group, which is surrounded by td tags, and I called m.group() to obtain a
> matching group. m.groupCount() returns 1 (the whole thing only is matched).
> I also tried: ".*<td>(.*)</td>.*" which matches everything, with
> m.groupCount() returning 1 again. Yet shouldn't it be matching 2 parts: The
> whole and the group?
m.groupCount() simply returns the number of matching groups in your regular
expression pattern.
If you are trying to find out what matched, use this paradigm:
Pattern p = Pattern.compile(".*<td>(.*)</td>.*");
Matcher m = p.matcher(someinputstring);
if (m.matches()) {
String insideMatch = m.group(1);
String entireMatch = m.group();
}
your insideMatch would then be whatever was in between the TDs.
Note though that REs are by nature greedy. Meaning, that if you have a
start <TD> and then after many other start and end TDs and tables you have a
</TD>, it will contain everything in the middle, including other markup
possibly beyond what you are intending.
If you want to match everything up until the close TD, use the '?' RE
modifier as such:
.*<td>(.*?)</td>.*
-CV
E.C. - 27 Feb 2004 19:51 GMT
> > I expected ".*<td>\\(.*\\)</td>.*" to work, matching the text inside the
> > group, which is surrounded by td tags, and I called m.group() to obtain a
[quoted text clipped - 18 lines]
>
> your insideMatch would then be whatever was in between the TDs.
Ah, I see what you mean. That works great, cheers :)
Mike