I am trying to convert a regex expression that I have in ruby to do the
same in java, but the dialects are different.
I am trying to parse a URL such that I get the domain name (possibly
with the www).
http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf"
For example, above would return:
www.yahoo.com
My ruby regex expression is such that:
/^(?:[^\/]+:\/\/)?([^\/:]+)/
And I was working on the java one, havent made much progress:
p = Pattern.compile("^http://([a-z0-9]*\.)*")
m = p.matcher( String("http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf")
)
Roedy Green - 21 Mar 2006 00:13 GMT
On 20 Mar 2006 15:05:23 -0800, "Berlin Brown"
<berlin.brown@gmail.com> wrote, quoted or indirectly quoted someone
who said :
>And I was working on the java one, havent made much progress:
>
> p = Pattern.compile("^http://([a-z0-9]*\.)*")
the key is match one character at a time, then when you have that
working, extend your pattern by one more character. The problem is
regexes are like working with a blindfold. You can't see why they are
failing to give the expected results.

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
Berlin Brown - 21 Mar 2006 01:05 GMT
Ok, without the spoon feeding I did what you said. Thanks. It is a
start, but this is what I ended up with. (and for those complete
regex-java newbies)
"http://(.*?)\\/(.*)
My thought process:
1. Clearly the 'http://' means that find 'http://' at the start of the
string.
2. I wanted the host(I will leave the www for now), so I wanted any
characters between the http and the first '/'. So the 'dot' means
seek for any character, the '*' match zero or more times, greedy
(opposite lazy, where lazy means fail after first match?)
So, I ended up with (.*?) and where the '(' and ')' represent a group.
3. Next, I needed to acknowledge the '/', a literal, so I also added
the '\\' for a literal.
4. Add the rest of the URL string in another group.
Roedy Green - 21 Mar 2006 01:10 GMT
On 20 Mar 2006 16:05:54 -0800, "Berlin Brown"
<berlin.brown@gmail.com> wrote, quoted or indirectly quoted someone
who said :
>3. Next, I needed to acknowledge the '/', a literal, so I also added
>the '\\' for a literal.
literal \ in regex is \\\\
see http://mindprod.com/jgloss/regex.html#QUOTING

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
Jussi Piitulainen - 21 Mar 2006 09:12 GMT
> "http://(.*?)\\/(.*)
Very nice, and much simpler than your first attempt.
> 3. Next, I needed to acknowledge the '/', a literal, so I also added
> the '\\' for a literal.
There is nothing special about '/' in a regexp. You know this, as
evidenced by the simple "http://" earlier in your regexp.
Dave Mandelin - 21 Mar 2006 00:42 GMT
What is the problem exactly? Does your regexp not match the string, or
does the matched group not extract the part you want?
Kurt M Peters - 21 Mar 2006 00:44 GMT
Don't forget that the way java handles strings "\" needs to be "\\"
K
>I am trying to convert a regex expression that I have in ruby to do the
> same in java, but the dialects are different.
[quoted text clipped - 18 lines]
> m = p.matcher( String("http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf")
> )
Bart Cremers - 21 Mar 2006 09:30 GMT
Just using your ruby regex works if used correctly in Java. I removed
the escapes to simplify it a bit, but it's not needed to remove them:
String pattern = "^(?:[^/]+://)?([^/:]+)";
String input = "http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf";
Matcher matcher = Pattern.compile(pattern).matcher(input);
if (matcher.find()) {
int start = matcher.start(1);
int end = matcher.end(1);
System.out.println(input.substring(start, end));
}
Regards,
Bart
Greg R. Broderick - 21 Mar 2006 16:29 GMT
> I am trying to parse a URL such that I get the domain name (possibly
> with the www).
[quoted text clipped - 4 lines]
>
> www.yahoo.com
Why waste your time re-inventing the wheel? Java has the built-in
java.net.URL class that will do this for you, via its getHost() method.
Cheers
GRB

Signature
---------------------------------------------------------------------
Greg R. Broderick [rot13] terto@oynpxubyvb.qlaqaf.bet
A. Top posters.
Q. What is the most annoying thing on Usenet?
---------------------------------------------------------------------
Nigel Wade - 21 Mar 2006 18:11 GMT
> I am trying to convert a regex expression that I have in ruby to do the
> same in java, but the dialects are different.
[quoted text clipped - 18 lines]
> m = p.matcher( String("http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf")
> )
There's no point in re-inventing wheels.
If you are working with URIs, why not use the URI tools available to you?
URI uri = new URI("http://www.yahoo.com/suckit/kjlaflaj/ljl?lsklf");
String domainName = uri.getHost();

Signature
Nigel Wade, System Administrator, Space Plasma Physics Group,
University of Leicester, Leicester, LE1 7RH, UK
E-mail : nmw@ion.le.ac.uk
Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555