Hi
just a quick question, would it be ill-advised to, to a certain extent,
normalise an object model? E.g. say I am crawling many websites and
retrieve documents and urls to keep in a memory cache. So I create an
url object for each url. When parsed, many of the urls would have same
components, such as the domain name part etc. So this would mean a lot
of the memory is taken up by identical strings. I know I can use
intern(), but my question is, would it be bad design to extract the
domain name part of the url into a separate domain name object (along
with protocol type (non-/-secure), username, password)? Sort of like in
sql, when one normalises the datamodel.
This can be done in two ways, either the sql way
DomainName (1) -> (*) FileSelector
or the more oo way using composition
URL -> DomainName
I suppose the last alternative is the more correct way to do it?
regards
tom
Sabine Dinis Blochberger - 26 Jun 2008 11:50 GMT
> Hi
>
[quoted text clipped - 8 lines]
> with protocol type (non-/-secure), username, password)? Sort of like in
> sql, when one normalises the datamodel.
I would say the domain and protocol impact on memory is negible. You
only would worry about this when your application actually runs into
memory related problems.
> This can be done in two ways, either the sql way
>
[quoted text clipped - 5 lines]
>
> I suppose the last alternative is the more correct way to do it?
Sure you could. But you will have to weigh the cost of retrieving the
values for each occasion vs. the cost of having a (short) string.
Then there's javas URL class[1], which helps you access the parts of an
url.
[1] <http://java.sun.com/javase/6/docs/api/java/net/URL.html>

Signature
Sabine Dinis Blochberger
Op3racional
www.op3racional.eu
Roedy Green - 26 Jun 2008 17:48 GMT
>just a quick question, would it be ill-advised to, to a certain extent,
>normalise an object model? E.g. say I am crawling many websites and
[quoted text clipped - 6 lines]
>with protocol type (non-/-secure), username, password)? Sort of like in
>sql, when one normalises the datamodel.
You might have a look at the code for the Replicator. It has huge
lists of local files. It normalises legs of directory names to avoid
the repetition and to make comparison faster.
http://mindprod.com/products1.html#REPLICATOR

Signature
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com