Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / January 2006

Tip: Looking for answers? Try searching our database.

Help for Regular Exression in split function

Thread view: 
YattaMaX - 16 Jan 2006 16:29 GMT
Hi All
( first of all: sorry for my bad english)

I Have necessity of a regular expression that extract only word (> 3
chars) without numbers or spechial chars.

Examples:

With this string :

String str1 = "jump:  qwe  donaldduck:.,#@?bye2xyz zkj ooo iuy
..uix#f4 lk@maxx0i.-oi  hkj"

str1.toLowerCase().trim().split( REGEX )

return:

jump
donaldduck
maxx

Help me please, I don't be able to find this regular expression :(

Bye
    MaX
Hendrik Maryns - 16 Jan 2006 17:51 GMT
> Hi All
> ( first of all: sorry for my bad english)
[quoted text clipped - 16 lines]
> donaldduck
> maxx

How about REGEX = "\\W*"?  (Bad naming, make that regex = "\\W*")

HTH, H.

Signature

Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org

YattaMaX - 16 Jan 2006 18:38 GMT
Hendrik Maryns ha scritto:

> How about REGEX = "\\W*"?  (Bad naming, make that regex = "\\W*")

Thanks.

Just a few questions :

- with \\W the numbers is included ?
(I don't want the number : jump3 -> jump)

- Word with means three char is exluded ?

(Sorry for my bad english)

bye
    MaX
YattaMaX - 16 Jan 2006 18:43 GMT
YattaMaX ha scritto:
> Hendrik Maryns ha scritto:
>>
>> How about REGEX = "\\W*"?  (Bad naming, make that regex = "\\W*")

No, this is not work correctly :(

With the string :
String str1 = "jump:  qwe  donaldduck:.,#@?bye2xyz zkj ooo iuy ..uix#f4
lk@maxx0i.-oi  hkj"

And the function :
str1.toLowerCase().trim().split( [\\W*] ) ;

This return:

jump

qwe
donaldduck

bye2xyz
zkj
ooo
iuy

uix
f4
lk
maxx0i

oi

hkj

Thanks
    MaX
opalpa@gmail.com opalinski from opalpaweb - 16 Jan 2006 18:52 GMT
"\\W\\W\\W\\W+"

oughta drop one, two, three character stuff and return stuff that is at
least four characters long.

Opalinski
opalpa@gmail.com
http://www.geocities.com/opalpaweb/
YattaMaX - 16 Jan 2006 19:02 GMT
opalpa@gmail.com opalinski from opalpaweb ha scritto:
> "\\W\\W\\W\\W+"
>
[quoted text clipped - 4 lines]
> opalpa@gmail.com
> http://www.geocities.com/opalpaweb/

\\W is "A non-word character"

I want only the word without number with at least 3 characters.

Thanks for your contribution.

Bye
    MaX
opalpa@gmail.com opalinski from opalpaweb - 16 Jan 2006 19:47 GMT
ok so

"[a-zA-Z][a-zA-Z][a-zA-Z][a-zA-Z]+"

Opalinski
opalpa@gmail.com
http://www.geocities.com/opalpaweb/
opalpa@gmail.com opalinski from opalpaweb - 16 Jan 2006 20:01 GMT
Sorry, didn't pay enough attention to the split being used instead of
patterns and matchers.

package experiment;
import java.util.regex.*;
public class Split {
 public static void main(String args[]) {
   String str1 = "jump:  qwe  donaldduck:.,#@?bye2xyz zkj ooo iuy
..uix#f4 l...@maxx0i.-oi  hk
j";
   System.out.println(str1);
   String w[] = str1.toLowerCase().trim().split( "[^a-z]" ) ;
   for (String s :/* in */ w) {
     if (s.length() > 3)
       System.out.println(s);
   }
 }
}

outputs:

jump:  qwe  donaldduck:.,#@?bye2xyz zkj ooo iuy ..uix#f4
l...@maxx0i.-oi  hkj
jump
donaldduck
maxx

Opalinski
opalpa@gmail.com
http://www.geocities.com/opalpaweb/
Hendrik Maryns - 16 Jan 2006 19:06 GMT
> "\\W\\W\\W\\W+"
>
> oughta drop one, two, three character stuff and return stuff that is at
> least four characters long.

That could be written nicer as "\\W{3,}", if I recall the syntax correctly.

But Yatta: why don’t you RTFM:
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html

H.

Signature

Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org

YattaMaX - 16 Jan 2006 19:40 GMT
Hendrik Maryns ha scritto:
>> "\\W\\W\\W\\W+"
>>
>> oughta drop one, two, three character stuff and return stuff that is at
>> least four characters long.
>
> That could be written nicer as "\\W{3,}", if I recall the syntax correctly.

Sorry, but "\\W"  , is not a "non-word character" ?

Why you use \\W ? when the necessity is get only word with at least
three characters ( not the contrary ).

> But Yatta: why don’t you RTFM:
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html

I know this page, is 7days that I read this, but without good result.

Thanks for your contribution.

Bye
    MaX
Oliver Wong - 16 Jan 2006 21:38 GMT
> Hi All
> ( first of all: sorry for my bad english)
[quoted text clipped - 18 lines]
>
> Help me please, I don't be able to find this regular expression :(

   split() is not what you want, as the regular expression you provide to
split() describes the seperators, not the acceptable strings.

   Why don't you try building a Matcher, and using it to find subsequences
which match your requirement of at least 3 alphabetic characters in a row?

http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Matcher.html

   - Oliver
YattaMaX - 16 Jan 2006 23:07 GMT
Oliver Wong ha scritto:

>     split() is not what you want, as the regular expression you provide to
> split() describes the seperators, not the acceptable strings.

I use split because I want an Array with only word > 3 character.

Thanks

Bye
    MaX
Oliver Wong - 16 Jan 2006 23:25 GMT
> Oliver Wong ha scritto:
>
>>     split() is not what you want, as the regular expression you provide
>> to split() describes the seperators, not the acceptable strings.
>
> I use split because I want an Array with only word > 3 character.

   Split will not do what you want. The arguments to split describe to it
the seperators. You have no information about the seperators. You have
information about the tokens you want. It's not that you want the seperators
to be 3 alphabetic characters long; you want the tokens to be 3 alphabetic
characters long. Split will not let you specify that.

   Therefore, I recommend you try a different approach. I mentioned Matcher
in my previous post, but personally I would avoid Regular Expressions
altogether for this problem and just use an DFA that keeps track of how many
alphabetic characters it has seen so far, and if that number exceeds 3, to
accept the given substring.

   - Oliver


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.