Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / First Aid / March 2006

Tip: Looking for answers? Try searching our database.

Best way to convert from HTML to plain text in java??

Thread view: 
google@lrlart.com - 19 Mar 2006 08:21 GMT
Hello,

I have a java servlet that processes plain text. I'd like to point to a
specific url and pull over a webpage, then convert it to plain text for
further processing.

I have written some code that simply strips tags from the html, but
this only does an OK job as it fails on poorly written html and
javascript (to name a few). Are there any java APIs that would perform
a better conversion? I've looked into JEditorPane and HTMLEditorKit,
but haven't had any luck in getting these to perform the conversion.
Thanks for any help!

Mike
William Brogden - 19 Mar 2006 15:43 GMT
> Hello,
>
[quoted text clipped - 10 lines]
>
> Mike

The JTidy library - http://jtidy.sourceforge.net/
will clean up bad HTML and has "Prettyprint" functions.

Bill


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.