Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / May 2006

Tip: Looking for answers? Try searching our database.

URL's and Set's do not mix - DNS lookups are compulsory

Thread view: 
Rogan Dawes - 17 May 2006 07:12 GMT
Hi folks,

I noticed a weird problem in a bit of code that I was writing. I wanted
to display a hierarchy of URL's in a TreeModel. In doing so, I was
adding URL's to a HashSet. While I was testing, I was using invalid
URL's, like "http://abcd/", etc

What happened was that the calls to set.contains(url) and set.add(url)
were showing delays of up to 4 seconds executing the methods. And this
changed, depending on whether I was using a HashSet or a TreeSet with a
custom Comparator.

What turned out to be the problem is that the URL.hashCode() method was
actually trying to resolve the address of the hostname that was
specified, via the protocol specific handler. And obviously, the
hostnames ("abcd") I was using did not exist, and the DNS resolution was
taking some time to timeout.

Am I the only person to think that this is a completely STUPID design?

I can think of many scenarios where one may want to keep even valid
URL's in a Set, without being able to resolve them to an IP address. For
example, a web scanner that works in a private environment (using a
corporate DNS), where the results may be reviewed on a machine outside
of the environment, with no access to the internal DNS servers.

This problem makes it almost impossible to use this kind of data
structure in an offline environment.

Does anyone have any suggestions on how to get around this issue? At
this point, I am thinking of simply copying the URL class into my own
code, and removing all traces of this idiocy.

Regards,

Rogan
Bart Cremers - 17 May 2006 07:37 GMT
Although I agree that it's total idiocy from the creators of the URL
class, a simple work-around is to use a URI instead of a URL. If you
need a URL from the URI simply use URI.toURL().
It might be that it's meant to be used this way, but I don't see why
URL requires a DNS lookup for hashing.

Regards,

Bart
Chris Uppal - 17 May 2006 10:25 GMT
> a simple work-around is to use a URI instead of a URL.

That's a big part of why class URI exists at all.  The class JavaDoc for one or
the other class explains this.

   -- chris
Rogan Dawes - 17 May 2006 11:10 GMT
> Although I agree that it's total idiocy from the creators of the URL
> class, a simple work-around is to use a URI instead of a URL. If you
[quoted text clipped - 5 lines]
>
> Bart

Many thanks to all for the responses. I have changed my code to use a
URI instead of a URL, and it is working perfectly.

Regards,

Rogan
Mike Schilling - 17 May 2006 07:41 GMT
> Hi folks,
>
[quoted text clipped - 15 lines]
>
> Am I the only person to think that this is a completely STUPID design?

It's part and parcel of the attempt to make URL.equals() independent of how
the hostname is specified.  This is actually hopeless, given the existence
of proxies, VPN, NAT, etc.  There's no guarantee that two different IP
addresses don't reference the same host, or that two identical IP adresses,
used at different times or in different contexts, name the same host.  I'm
not sure why Sun is even making the attempt.

> I can think of many scenarios where one may want to keep even valid URL's
> in a Set, without being able to resolve them to an IP address. For
[quoted text clipped - 8 lines]
> point, I am thinking of simply copying the URL class into my own code, and
> removing all traces of this idiocy.

Two suggestions, at least.

1. Store the string verion of the URL instead of the URL itself.  It's cheap
enough to reparse it when necessary.
2. Wrap the URL with another class that bases equals() and hashCode() on the
URL's string value.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.