>> > you have a sentence and you need to find wether it contains a word;
>> > for example "he is great " .containts("is") returns true, but "his
[quoted text clipped - 11 lines]
>
> but I also want it to match things such as ,is, is.
Then maybe the \W construct can help. This will match any non-word
character.
".*\\W+" + searchString + "\\W+.*"
non-word characters are defined as anything other than an alfanumeric
character or an underscore. So this would return true for "what is, once
was." but not for "his word"
There may be a problem if the search string is at the end or beginning
(or both) of the line you're searching, but you can check for that with
String:startsWith and String:endsWith

Signature
Beware the False Authority Syndrome
Chris Smith - 04 Jan 2006 03:46 GMT
> Then maybe the \W construct can help. This will match any non-word
> character.
>
> ".*\\W+" + searchString + "\\W+.*"
Just to be paranoid, make that:
".*\\W+" + Pattern.quote(searchString) + "\\W+.*"
Note that Pattern.quote is only available in Java 1.5. Prior to Java
1.5, it's exceedingly difficult to search for arbitrary substrings using
regular expressions, and you'd be better of using String.indexOf(String)
and checking the surrounding characters on your own.
> There may be a problem if the search string is at the end or beginning
> (or both) of the line you're searching, but you can check for that with
> String:startsWith and String:endsWith
Or, since you've got a Java regular expression anyway:
"(^|.*\\W+)" + Pattern.quote(searchString) + "($|\\W+.*)"

Signature
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
puzzlecracker - 04 Jan 2006 05:21 GMT
> > Then maybe the \W construct can help. This will match any non-word
> > character.
[quoted text clipped - 15 lines]
>
> Or, since you've got a Java regular expression anyway
that is slow with JFC... any old fashion ways to accomplish this task?
Thanks
> "(^|.*\\W+)" + Pattern.quote(searchString) + "($|\\W+.*)"
>
[quoted text clipped - 4 lines]
> Chris Smith - Lead Software Developer/Technical Trainer
> MindIQ Corporation
puzzlecracker - 04 Jan 2006 06:01 GMT
> > Then maybe the \W construct can help. This will match any non-word
> > character.
[quoted text clipped - 24 lines]
> Chris Smith - Lead Software Developer/Technical Trainer
> MindIQ Corporation
regular expressiona are quite slow with jfc.... can anyone suggest a
quick indexof variant or lagacy variant?
Chris Smith - 04 Jan 2006 06:24 GMT
> regular expressiona are quite slow with jfc.... can anyone suggest a
> quick indexof variant or lagacy variant?
Huh? Is this some "jfc" that I'm not familiar with? Regular
expressions are no slower or faster than normal with the Java Foundation
Classes (that is, Swing and some related APIs). In fact, the two have
little to do with each other.
In any case, Noodles Jefferson already gave you a solution without using
regular expressions. You seemed no happier with that, because it didn't
work precisely the way you want. If you're that unable to write your
own code, perhaps its time to think about why you're involved in
programming. An if statement probably won't kill you.

Signature
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
Jaakko Kangasharju - 04 Jan 2006 08:15 GMT
> Then maybe the \W construct can help. This will match any non-word
> character.
[quoted text clipped - 8 lines]
> (or both) of the line you're searching, but you can check for that with
> String:startsWith and String:endsWith
You can use boundary matches to overcome this problem. The \b
construct matches a word boundary, so modifying your expression to
".*\\b" + searchString + "\\b.*" matches searchString between word
boundaries, including at the beginning or the end.

Signature
Jaakko Kangasharju, Helsinki Institute for Information Technology
Will you be my friend, please?
Chris Smith - 04 Jan 2006 08:28 GMT
> You can use boundary matches to overcome this problem. The \b
> construct matches a word boundary, so modifying your expression to
> ".*\\b" + searchString + "\\b.*" matches searchString between word
> boundaries, including at the beginning or the end.
So if that works, then the correct version can be written as:
".*\\b" + Pattern.quote(searchString) + "\\b.*"
The Pattern.quote could technically be omitted if searchString were
guaranteed to contain only word characters... but it would need to be
accompanied with copious documentation to explain that fact.

Signature
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
zero - 04 Jan 2006 13:11 GMT
>> You can use boundary matches to overcome this problem. The \b
>> construct matches a word boundary, so modifying your expression to
[quoted text clipped - 8 lines]
> guaranteed to contain only word characters... but it would need to be
> accompanied with copious documentation to explain that fact.
It seems patterns are indeed a complex subject, with lots of near-identical
alternatives. Btw, anyone know how this works with strings with
international content? The Pattern JavaDoc states that a word character is
[a-ZA-Z_0-9], so accented characters won't work here - and I'm not even
talking about non-latin script.

Signature
Beware the False Authority Syndrome