I am working on a simple method that will assign a specific extension
(e.g. ".jsp", ".php", ".cfm", etc.) to the end of a URL if it doesn't
find anything marking a valid extension, however, I do not want to add
an extension if one is found.
Consider my code:
<code>
<pre>
import java.util.regex.Pattern;
...
public static final String urlEndSlashPattern = "/?";
public static final String urlQSPattern = "\\??([a-zA-Z0-9\\-_\\.]
+=[^&]*&?)*";
public static final String urlAnchorPattern = "#[^#]*$";
...
public static void addExtToUrl(String url, String myExt, String[]
exts) {
StringBuffer sb = new StringBuffer();
boolean hasExt = false;
for (int i = 0; i < exts.length; i++) {
sb.append(".").append(exts[i]).append(urlEndSlashPattern).append(urlQSPattern).append(urlAnchorPattern);
if (Pattern.matches(sb.toString(), url)) {
hasExt = true;
}
sb = new StringBuffer();
}
if (!hasExt) {
url += "." + myExt;
}
{code}
The issue I want to bring up is the regular expression pattern I'm
using appears to fail. I want to check and see if the URL I provide
ends with a valid extension, followed by optional "/" or a query
string or an anchor or any combination of these.
Like say if I have
http://www.blah.com/index.html
Then don't add the ".jsp" extension
But if I have
http://www.blah.com/registration/
Then I *want* to add the ".jsp" extension:
http://www.blah.com/registration.jsp
Or if I have:
http://www.blah.com/registration/?foo=bar#baz
Then it needs to change to
http://www.blah.com/registration.jsp?foo=bar#baz
But if I have
http://www.blah.com/registration/index.php?foo=bar#baz
Then I do *not* add the ".jsp" extension.
Hope that makes sense now. Bottom line is that the pattern above
doesn't seem to work. Ideas?
Thanks
Roedy Green - 20 Jun 2008 15:48 GMT
On Fri, 20 Jun 2008 06:25:31 -0700 (PDT), "phillip.s.powell@gmail.com"
<phillip.s.powell@gmail.com> wrote, quoted or indirectly quoted
someone who said :
>public static final String urlQSPattern = "\\??([a-zA-Z0-9\\-_\\.]
>+=[^&]*&?)*";
It might help if you provided a few strings you INTENDED this to
match.
I find the way I solve these problems is to chop my regex to the bone
and get it matching. Then I add characters to the end just a few at a
time. That way you know precisely where the trouble is when it fails.
Has anyone looked to see if you can get the offset of the furthest a
match progressed down the regex expression? This would be very
helpful in debugging.
The following is an aside to your problem.
Normally using a StringBuilder will make concatenation faster. But in
this case, it probably slows it down since the compiler won't be able
to track what is happening and glue the bits together at compile time.
I always like to use
private static final Pattern p = Pattern.compile("xxx");
since compiling the regex string is a heavy duty operation and need
only be done once.

Signature
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
phillip.s.powell@gmail.com - 20 Jun 2008 17:00 GMT
> On Fri, 20 Jun 2008 06:25:31 -0700 (PDT), "phillip.s.pow...@gmail.com"
> <phillip.s.pow...@gmail.com> wrote, quoted or indirectly quoted
[quoted text clipped - 5 lines]
> It might help if you provided a few strings you INTENDED this to
> match.
I can't, the strings are dynamically being included via database via
user input. In short, the URLs can literally be anything on earth.
But this might help you to understand.
Suppose that THIS is your URL: http://www.blah.com/foo/
Then it must become: http://www.blah.com/foo.jsp
But if it is this: http://www.blah.com/foo.php
Then it must remain: http://www.blah.com/foo.php
But if it is this:
http://www.blah.com/foo/?bar=baz
Then it must become:
http://www.blah.com/foo.jsp?bar=baz
But if it is
http://www.blah.com/foo/?myjavaclass=java.util.HashMap
Then it becomes
http://www.blah.com/foo.jsp?myjavalclass=java.util.HashMap
Hope that helps
> I find the way I solve these problems is to chop my regex to the bone
> and get it matching. Then I add characters to the end just a few at a
[quoted text clipped - 20 lines]
> Roedy Green Canadian Mind Products
> The Java Glossaryhttp://mindprod.com
Roedy Green - 20 Jun 2008 20:16 GMT
On Fri, 20 Jun 2008 09:00:08 -0700 (PDT), "phillip.s.powell@gmail.com"
<phillip.s.powell@gmail.com> wrote, quoted or indirectly quoted
someone who said :
>http://www.blah.com/foo/?bar=baz
>
>Then it must become:
>
>http://www.blah.com/foo.jsp?bar=baz
I have a list of general regex debugging tips at:
http://mindprod.com/jgloss/regex.html#TIPS
the key one is use several small regexes rather than one big one.

Signature
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Mark Space - 20 Jun 2008 20:57 GMT
> I am working on a simple method that will assign a specific extension
> (e.g. ".jsp", ".php", ".cfm", etc.) to the end of a URL if it doesn't
> find anything marking a valid extension, however, I do not want to add
> an extension if one is found.
You might try the URI class. Make a URI, get the path, and then just
check that one string for a valid extension. This will be much easier
than trying to parse a URI yourself.
package uritest;
import java.net.URI;
import java.net.URISyntaxException;
public class Main {
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws URISyntaxException {
// TODO code application logic here
URI uri = new URI( "http://www.blah.com/registration/" );
String path = uri.getPath();
if( path.endsWith("/") )
path = path.substring( 0, path.length()-1 );
if( !path.matches( ".+\\..*") )
path += ".jsp";
System.out.println( path );
}
}
phillip.s.powell@gmail.com - 24 Jun 2008 15:27 GMT
> phillip.s.pow...@gmail.com wrote:
> > I am working on a simple method that will assign a specific extension
[quoted text clipped - 5 lines]
> check that one string for a valid extension. This will be much easier
> than trying to parse a URI yourself.
Going forward that is perhaps a really good idea for checking the
validity of a URL, however, what i was trying to do was to check to
see if the URL had an extension and add ".jsp" if it did, otherwise,
add nothing.
What I wound up doing, though, was a lot simpler, by chopping off any
optional query strings and anchors I then checked the newly-created
end of the URL for an extension via Pattern.matches("xxx") and added
".jsp" accordingly.
But the URI class looks interesting as does the URL class. Thanks!
> package uritest;
>
[quoted text clipped - 21 lines]
>
> - Show quoted text -
Mark Space - 24 Jun 2008 22:06 GMT
> Going forward that is perhaps a really good idea for checking the
> validity of a URL, however, what i was trying to do was to check to
My idea is that the URL you have could require some pretty sophisticated
parsing, which would be very hard with just one single regex pattern (if
not impossible). The URI class provides an already debugged parser
which makes it child's play to extract the path from any URI.
If your URLs can't be validated by the URI class, then yeah, it's not
going to be able to parse them, but I assumed that you had valid URLs
and not bits and pieces. I'm not sure why you had to remove the query
strings and anchors, did the URI class not handle them?