> Hi,
>
[quoted text clipped - 8 lines]
> i) A program which counts words in HTML file but doesnt include HTML
> tags.
With http://www.ebi.ac.uk/~kirsch/monq-doc/monq/programs/Grep.html
you can do things like
java monq.programs.Grep '<[^>]+>' '' '[A-Za-z]+' '%0\n' <yourhtml.html
on the command line to get fetch all words that do not below to a
tag. The mechanism behind it is
http://www.ebi.ac.uk/~kirsch/monq-doc/monq/jfa/Nfa.html which you can
use progammatically.
> ii) A program which counts only Bolds and Italics in HTML file.
This would require to look for `<b>' and `<em>' tags and can easily be
added as pattern/action pairs to the Nfa doing the word counting.
I am off to the pub now, otherwise I would've written the class, max
20 lines:-) To download the software see signature.
Harald.

Signature
---------------------+---------------------------------------------
Harald Kirsch (@home)|
Java Text Crunching: http://www.ebi.ac.uk/Rebholz-srv/whatizit/software