Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / March 2006

Tip: Looking for answers? Try searching our database.

Simple regex in java to extract the domain name

Thread view: 
Berlin  Brown - 21 Mar 2006 00:05 GMT
I am trying to convert a regex expression that I have in ruby to do the
same in java, but the dialects are different.

I am trying to parse a URL such that I get the domain name (possibly
with the www).

http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf"

For example, above would return:

www.yahoo.com

My ruby regex expression is such that:

/^(?:[^\/]+:\/\/)?([^\/:]+)/

And I was working on the java one, havent made much progress:

p = Pattern.compile("^http://([a-z0-9]*\.)*")

m = p.matcher( String("http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf")
)
Roedy Green - 21 Mar 2006 00:13 GMT
On 20 Mar 2006 15:05:23 -0800, "Berlin  Brown"
<berlin.brown@gmail.com> wrote, quoted or indirectly quoted someone
who said :

>And I was working on the java one, havent made much progress:
>
> p = Pattern.compile("^http://([a-z0-9]*\.)*")

the key is match one character at a time, then when you have that
working, extend your pattern by one more character. The problem is
regexes are like working with a blindfold. You can't see why they are
failing to give the expected results.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Berlin  Brown - 21 Mar 2006 01:05 GMT
Ok, without the spoon feeding I did what you said.  Thanks.  It is a
start, but this is what I ended up with.  (and for those complete
regex-java newbies)

"http://(.*?)\\/(.*)

My thought process:
1. Clearly the 'http://' means that find 'http://' at the start of the
string.

2. I wanted the host(I will leave the www for now), so I wanted any
characters between the http and the first '/'.  So the  'dot' means
seek for any character, the '*' match zero or more times, greedy
(opposite lazy, where lazy means fail after first match?)
So, I ended up with (.*?)  and where the '(' and ')' represent a group.

3. Next, I needed to acknowledge the '/', a literal, so I also added
the '\\' for a literal.

4. Add the rest of the URL string in another group.
Roedy Green - 21 Mar 2006 01:10 GMT
On 20 Mar 2006 16:05:54 -0800, "Berlin  Brown"
<berlin.brown@gmail.com> wrote, quoted or indirectly quoted someone
who said :

>3. Next, I needed to acknowledge the '/', a literal, so I also added
>the '\\' for a literal.

literal \ in regex is \\\\

see http://mindprod.com/jgloss/regex.html#QUOTING
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Jussi Piitulainen - 21 Mar 2006 09:12 GMT
> "http://(.*?)\\/(.*)

Very nice, and much simpler than your first attempt.

> 3. Next, I needed to acknowledge the '/', a literal, so I also added
> the '\\' for a literal.

There is nothing special about '/' in a regexp. You know this, as
evidenced by the simple "http://" earlier in your regexp.
Dave Mandelin - 21 Mar 2006 00:42 GMT
What is the problem exactly? Does your regexp not match the string, or
does the matched group not extract the part you want?
Kurt M Peters - 21 Mar 2006 00:44 GMT
Don't forget that the way java handles strings "\" needs to be "\\"
K
>I am trying to convert a regex expression that I have in ruby to do the
> same in java, but the dialects are different.
[quoted text clipped - 18 lines]
> m = p.matcher( String("http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf")
> )
Bart Cremers - 21 Mar 2006 09:30 GMT
Just using your ruby regex works if used correctly in Java. I removed
the escapes to simplify it a bit, but it's not needed to remove them:

String pattern = "^(?:[^/]+://)?([^/:]+)";
String input = "http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf";

Matcher matcher = Pattern.compile(pattern).matcher(input);
if (matcher.find()) {
   int start = matcher.start(1);
   int end = matcher.end(1);

   System.out.println(input.substring(start, end));
}

Regards,

Bart
Greg R. Broderick - 21 Mar 2006 16:29 GMT
> I am trying to parse a URL such that I get the domain name (possibly
> with the www).
[quoted text clipped - 4 lines]
>
> www.yahoo.com

Why waste your time re-inventing the wheel?  Java has the built-in
java.net.URL class that will do this for you, via its getHost() method.

Cheers
GRB

Signature

---------------------------------------------------------------------
Greg R. Broderick                 [rot13] terto@oynpxubyvb.qlaqaf.bet

A. Top posters.
Q. What is the most annoying thing on Usenet?
---------------------------------------------------------------------

Nigel Wade - 21 Mar 2006 18:11 GMT
> I am trying to convert a regex expression that I have in ruby to do the
> same in java, but the dialects are different.
[quoted text clipped - 18 lines]
> m = p.matcher( String("http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf")
> )

There's no point in re-inventing wheels.
If you are working with URIs, why not use the URI tools available to you?

URI uri = new URI("http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf");
String domainName = uri.getHost();

Signature

Nigel Wade, System Administrator, Space Plasma Physics Group,
           University of Leicester, Leicester, LE1 7RH, UK
E-mail :    nmw@ion.le.ac.uk
Phone :     +44 (0)116 2523548, Fax : +44 (0)116 2523555



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.