Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / January 2007

Tip: Looking for answers? Try searching our database.

Tricky regex question

Thread view: 
Juan Singh - 31 Jan 2007 18:54 GMT
Hi,

I need to split words from a sentence that are more than 2 letters. An
example sentence is:

This is "very" 'tricky'. I won't be able see it.

I came up with the following regex, but it breaks the word [won't] and I
only get the part that is before the single quote.

\b[a-zA-Z]{2,}+\b

My objective is to extract the following words from the example sentence
above.

This, very, tricky, won't, able

Thanks.
Oliver Wong - 31 Jan 2007 19:08 GMT
> Hi,
>
[quoted text clipped - 12 lines]
>
> This, very, tricky, won't, able

   I'd split the string on whitespace (to get {This, is, "very", 'tricky'.,
I, won't, be, able, see, it.}) and then write a function which acts like
trim(), but rather than removing leading or trailing whitespace, it removes
leading or trailing punctuation. That'll give {This, is, very, tricky, I,
won't, be, able, see, it} which you can then walk through and eliminate all
2 char characters or less.

   - Oliver
Daniel Pitts - 31 Jan 2007 19:17 GMT
> > Hi,
>
[quoted text clipped - 21 lines]
>
>     - Oliver
I was going to suggest this, then I realize that there IS a regex that
can do it.
\w+('?\w){2,}
I even posted an SSCCE to show how it works.
Daniel Pitts - 31 Jan 2007 19:14 GMT
> Hi,
>
[quoted text clipped - 14 lines]
>
> Thanks.
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
   public static void main(String[] args) {
       Pattern longword = Pattern.compile("\\w+('?\\w){2,}");
       String input = "This is \"very\" 'tricky'. I won't be able see
it.";
       Matcher matcher = longword.matcher(input);
       while (matcher.find()) {
           System.out.println(matcher.group());
       }
   }
}
Juan Singh - 31 Jan 2007 19:29 GMT
Daniel.

PERFECT! Thank you. This is exactly what I was looking for.

Juan.

>> Hi,
>>
[quoted text clipped - 28 lines]
>     }
> }
Daniel Pitts - 31 Jan 2007 19:32 GMT
> Daniel.
>
[quoted text clipped - 34 lines]
> >     }
> > }

As Andrew Thompson says:
"Future lack of top-posting will be thanks enough".

On this group, we reply AFTER the quote or mixed-in with the quote!


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.