Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / First Aid / June 2008

Tip: Looking for answers? Try searching our database.

Regular expression pattern for matching end of a URL

Thread view: 
phillip.s.powell@gmail.com - 20 Jun 2008 14:25 GMT
I am working on a simple method that will assign a specific extension
(e.g. ".jsp", ".php", ".cfm", etc.) to the end of a URL if it doesn't
find anything marking a valid extension, however, I do not want to add
an extension if one is found.

Consider my code:

<code>
<pre>
import java.util.regex.Pattern;
...
public static final String urlEndSlashPattern = "/?";

public static final String urlQSPattern = "\\??([a-zA-Z0-9\\-_\\.]
+=[^&]*&?)*";

public static final String urlAnchorPattern = "#[^#]*$";

...

public static void addExtToUrl(String url, String myExt, String[]
exts) {

  StringBuffer sb = new StringBuffer();
  boolean hasExt = false;
  for (int i = 0; i < exts.length; i++) {

sb.append(".").append(exts[i]).append(urlEndSlashPattern).append(urlQSPattern).append(urlAnchorPattern);
    if (Pattern.matches(sb.toString(), url)) {
     hasExt = true;
    }
    sb = new StringBuffer();
   }
   if (!hasExt) {
    url += "." + myExt;
   }

  {code}

The issue I want to bring up is the regular expression pattern I'm
using appears to fail.  I want to check and see if the URL I provide
ends with a valid extension, followed by optional "/" or a query
string or an anchor or any combination of these.

Like say if I have

http://www.blah.com/index.html

Then don't add the ".jsp" extension

But if I have

http://www.blah.com/registration/

Then I *want* to add the ".jsp" extension:

http://www.blah.com/registration.jsp

Or if I have:

http://www.blah.com/registration/?foo=bar#baz

Then it needs to change to

http://www.blah.com/registration.jsp?foo=bar#baz

But if I have

http://www.blah.com/registration/index.php?foo=bar#baz

Then I do *not* add the ".jsp" extension.

Hope that makes sense now.  Bottom line is that the pattern above
doesn't seem to work.  Ideas?

Thanks
Roedy Green - 20 Jun 2008 15:48 GMT
On Fri, 20 Jun 2008 06:25:31 -0700 (PDT), "phillip.s.powell@gmail.com"
<phillip.s.powell@gmail.com> wrote, quoted or indirectly quoted
someone who said :

>public static final String urlQSPattern = "\\??([a-zA-Z0-9\\-_\\.]
>+=[^&]*&?)*";

It might help if you provided a few strings you INTENDED this to
match.

I find the way I solve these problems is to chop my regex to the bone
and get it matching. Then I add characters to the end just a few at a
time. That way you know precisely where the trouble is when it fails.

Has anyone looked to see if you can get the offset of the furthest a
match progressed down the regex expression?  This would be very
helpful in debugging.

The following is an aside to your problem.

Normally using a StringBuilder will make concatenation faster. But in
this case, it probably slows it down since the compiler won't be able
to track what is happening and glue the bits together at compile time.

I always like to use

private static final Pattern p = Pattern.compile("xxx");

since compiling the regex string is a heavy duty operation and need
only be done once.
Signature


Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

phillip.s.powell@gmail.com - 20 Jun 2008 17:00 GMT
> On Fri, 20 Jun 2008 06:25:31 -0700 (PDT), "phillip.s.pow...@gmail.com"
> <phillip.s.pow...@gmail.com> wrote, quoted or indirectly quoted
[quoted text clipped - 5 lines]
> It might help if you provided a few strings you INTENDED this to
> match.

I can't, the strings are dynamically being included via database via
user input.  In short, the URLs can literally be anything on earth.

But this might help you to understand.

Suppose that THIS is your URL: http://www.blah.com/foo/

Then it must become: http://www.blah.com/foo.jsp

But if it is this: http://www.blah.com/foo.php

Then it must remain: http://www.blah.com/foo.php

But if it is this:

http://www.blah.com/foo/?bar=baz

Then it must become:

http://www.blah.com/foo.jsp?bar=baz

But if it is

http://www.blah.com/foo/?myjavaclass=java.util.HashMap

Then it becomes

http://www.blah.com/foo.jsp?myjavalclass=java.util.HashMap

Hope that helps

> I find the way I solve these problems is to chop my regex to the bone
> and get it matching. Then I add characters to the end just a few at a
[quoted text clipped - 20 lines]
> Roedy Green Canadian Mind Products
> The Java Glossaryhttp://mindprod.com
Roedy Green - 20 Jun 2008 20:16 GMT
On Fri, 20 Jun 2008 09:00:08 -0700 (PDT), "phillip.s.powell@gmail.com"
<phillip.s.powell@gmail.com> wrote, quoted or indirectly quoted
someone who said :

>http://www.blah.com/foo/?bar=baz
>
>Then it must become:
>
>http://www.blah.com/foo.jsp?bar=baz

I have a list of general regex debugging tips at:

http://mindprod.com/jgloss/regex.html#TIPS

the key one is use several small regexes rather than one big one.

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Mark Space - 20 Jun 2008 20:57 GMT
> I am working on a simple method that will assign a specific extension
> (e.g. ".jsp", ".php", ".cfm", etc.) to the end of a URL if it doesn't
> find anything marking a valid extension, however, I do not want to add
> an extension if one is found.

You might try the URI class.  Make a URI, get the path, and then just
check that one string for a valid extension.  This will be much easier
than trying to parse a URI yourself.

package uritest;

import java.net.URI;
import java.net.URISyntaxException;

public class Main {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws URISyntaxException {
        // TODO code application logic here

        URI uri = new URI( "http://www.blah.com/registration/" );
        String path = uri.getPath();
        if( path.endsWith("/") )
            path = path.substring( 0, path.length()-1 );
        if( !path.matches( ".+\\..*") )
            path += ".jsp";
        System.out.println( path );
    }

}
phillip.s.powell@gmail.com - 24 Jun 2008 15:27 GMT
> phillip.s.pow...@gmail.com wrote:
> > I am working on a simple method that will assign a specific extension
[quoted text clipped - 5 lines]
> check that one string for a valid extension.  This will be much easier
> than trying to parse a URI yourself.

Going forward that is perhaps a really good idea for checking the
validity of a URL, however, what i was trying to do was to check to
see if the URL had an extension and add ".jsp" if it did, otherwise,
add nothing.

What I wound up doing, though, was a lot simpler, by chopping off any
optional query strings and anchors I then checked the newly-created
end of the URL for an extension via Pattern.matches("xxx") and added
".jsp" accordingly.

But the URI class looks interesting as does the URL class.  Thanks!

> package uritest;
>
[quoted text clipped - 21 lines]
>
> - Show quoted text -
Mark Space - 24 Jun 2008 22:06 GMT
> Going forward that is perhaps a really good idea for checking the
> validity of a URL, however, what i was trying to do was to check to

My idea is that the URL you have could require some pretty sophisticated
parsing, which would be very hard with just one single regex pattern (if
not impossible).  The URI class provides an already debugged parser
which makes it child's play to extract the path from any URI.

If your URLs can't be validated by the URI class, then yeah, it's not
going to be able to parse them, but I assumed that you had valid URLs
and not bits and pieces.  I'm not sure why you had to remove the query
strings and anchors, did the URI class not handle them?


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.