Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / October 2006

Tip: Looking for answers? Try searching our database.

how do I check wellformedness of html files?

Thread view: 
drgonzo120 - 16 Oct 2006 10:12 GMT
hello,

As my first mission n my first job i have to check the wellformedness
of about 1000 html files ...

I assume there must already be some java-classes/packages/libs on the
net that do this ??? It cannot be that I am the first one who has to do
this ...

So, does anybody know any online libs that do this???

Thanks !
hiwa - 16 Oct 2006 10:17 GMT
> hello,
>
[quoted text clipped - 8 lines]
>
> Thanks !
http://validator.w3.org/
http://www.htmlhelp.com/tools/validator/
drgonzo120 - 16 Oct 2006 10:46 GMT
drgonzo120 schreef:

> hello,
>
[quoted text clipped - 8 lines]
>
> Thanks !

it will be a console program,  so i need classes that accept a html a
file and check it, i guess.
Oliver Wong - 16 Oct 2006 16:09 GMT
> drgonzo120 schreef:
>
[quoted text clipped - 13 lines]
> it will be a console program,  so i need classes that accept a html a
> file and check it, i guess.

   See hiwa's reply, and also consider JTidy.

   - Oliver
Martin Gregorie - 16 Oct 2006 20:13 GMT
>> drgonzo120 schreef:
>>
[quoted text clipped - 16 lines]
>
>     - Oliver

Take a look at the HTML Tidy project, http://tidy.sourceforge.net

The original HTML Tidy is a C command line utility but there are Java
and Perl versions (Jtidy is one of them), all referenced from the
project. Its worth a visit: there are other useful things too, such HTML
editors which integrate HTML Tidy.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

drgonzo120 - 17 Oct 2006 08:57 GMT
hello, it's quite simple what i need tot do:

for example: this is a sample text from the html files:

<table border=1 width="100%" >
<tr>
<td width=20%><noindex>Betreft :</noindex></td>
<td colspan=3>
<betreft><P><A NAME="b_betreft"></A>Kinderrechten: implementatie van
het VN-verdrag<BR>Jaarlijkse verslaggeving van de Vlaamse regering aan
het Vlaams Parlement en aan de kinderrechtencommissaris omtrent de
implementatie van het VN-verdrag van 20 november 1989 inzake de rechten
van het kind<BR>Tweede verslag d.d. 29 september 2000 <A
NAME="e_betreft"></A></betreft>
</td></tr>

Per html file i need to extract the contents of these special tags ...
<betreft> (and others), (& create xml files out of them), is it
possible to read a html file as a xml file and do some xpath stuff on
it ???

Or just extract tags from a simple text file ...

" JTidy provides a DOM interface to the document that is being
processed, which effectively makes you able to use JTidy as a DOM
parser for real-world HTML."
but no where i can find a good reference to jtidy ...

I still don't know how I'm gonna do it, maybe write it all myself ....

greetings
Martin Gregorie - 17 Oct 2006 12:37 GMT
> hello, it's quite simple what i need tot do:
>
[quoted text clipped - 25 lines]
>
> I still don't know how I'm gonna do it, maybe write it all myself ....

Have you looked at the HTML, HTMLEditorKit and HTMLDocument classes?

The HTMLEditorKit contains a parser I used as the basis for a URL
checker. This extracts <A> tags from HTML pages, Sets up a URL instance
from the href attribute and sees if it is accessible. Access failures
are reported for manual examination and fixes.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

Oliver Wong - 17 Oct 2006 14:59 GMT
> hello, it's quite simple what i need tot do:
>
[quoted text clipped - 16 lines]
> possible to read a html file as a xml file and do some xpath stuff on
> it ???

   This is possible if and only if the HTML file actually is an XML file
(the HTML file format and the XML file format overlap, but are not identical
to each other). Otherwise, first you'll need something like "XMLTidy" (a
fictional product I just made up) to fix the broken XML -- things like
making sure every open tag is balanced by a closing tag, etc. I noticed in
your example, the <table>, <P> and <BR> tags are never closed, for example.

   - Oliver
Andy Dingley - 17 Oct 2006 11:11 GMT
> As my first mission n my first job i have to check the wellformedness
> of about 1000 html files ...

Why use Java?  The usual tool for this is HTML Tidy, which you can
drive perfectly adequately from the command line with a couple of lines
of shell script.
Sachin - 17 Oct 2006 11:19 GMT
Hi,

Have a look at javacc help files and documentations.

This url will help you...
https://javacc.dev.java.net/servlets/ProjectDocumentList?folderID=110

Regards,
Sachin


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.