still no comments on the actual code and its optimization ... cmon
gurus, I know you are hiding in there :)
no editing huh ? hehe ok .. so it's like the remove command in Unix :)
> still no comments on the actual code and its optimization ... cmon
> gurus, I know you are hiding in there :)
I haven't analysed your code in detail, but a few things I noticed:
Firstly, trying to match "<a href=\"" won't work in all cases. You can
have whitespace around the '=' character, you can have more than one
whitepsace character between the 'a' and "href" and they don't have to be
spaces (they could be tabs or new lines).
Secondly, if you are using Java 5, use StringBuilder instead of
StringBuffer since your code is not multi-threaded and doesn't need to
synchronise. That said, performance gains will probably not be noticeable.
Thirdly, I would use the readLine method on BufferedReader, rather than
reading one character at a time.
Finally, why not change your code so that it accepts a URL rather than a
file system path (you can use a file:// URL if you need to access local
files)? That way you can point your program at a page on the web to
extract links (or even recursively extract the links files that are linked
to from the first file).
Dan.

Signature
Daniel Dyer
http://www.dandyer.co.uk
chingooo3k@yahoo.ca - 03 Nov 2005 20:54 GMT
Thanks Dan.
<quote>
Firstly, trying to match "<a href=\"" won't work in all cases. You can
have whitespace around the '=' character, you can have more than one
whitepsace character between the 'a' and "href" and they don't have to
be
spaces (they could be tabs or new lines).
</quote>
Yup.. this is just a prototype I guess so I didn't think about that but
it's pretty easy to include optional whitespaces using regular
expressions....
<quote>
Secondly, if you are using Java 5, use StringBuilder instead of
StringBuffer since your code is not multi-threaded and doesn't need to
synchronise. That said, performance gains will probably not be
noticeable.
</quote>
hmm cool ... well I just started this so there is no threading but with
what I have in mind (assuming I can atleast get this to work) is
moderately complex gui which will probably need threading.. I still
don't know for sure.
<quote>
Thirdly, I would use the readLine method on BufferedReader, rather than
reading one character at a time.
</quote>
ah nice .. I was afraid to use readLine because I read somewhere it has
some bugs/issues... but ok I'll use this instead :)
<quote>
Finally, why not change your code so that it accepts a URL rather than
a
file system path (you can use a file:// URL if you need to access local
files)? That way you can point your program at a page on the web to
extract links (or even recursively extract the links files that are
linked
to from the first file).
</quote>
hmmm eventually yes but right now I'm just trying to get it up and
doing something useful for me. Currently, I am still unsure how fast
this is and how accurate so that's why I was afraid of major blunders
in my approach.
Or maybe there are already kick a.s parsers that can leech html links
?? the ones I googled all have some crap dependency issues and some
don't even ship with the proper source files (HTMLSchema in tagsoup
html parser) .... need direction !! :)
Thanks for all the help again!