Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / First Aid / April 2004

Tip: Looking for answers? Try searching our database.

HTMLDocument

Thread view: 
Michael Tovbin - 18 Apr 2004 15:45 GMT
Hi:

I have a little problem. I would like to iterate through some HTML
tags programmatically and want to be lazy about it so would like to
avoid text processing.

I have done a little digging and it looks like HTMLDocument and
HTMLDocument.Iterator will do the job.

Except, wherever I look, there is no indication how to load an actual
HTML document downloaded from the Web into an HTMLDocument instance. I
have found that it is somehow done through HTMLDocument.HTMLReader but
nowhere is it explained exactly how.

I would appreciate any help or code samples.

Thanks
MT
Andy Flowers - 18 Apr 2004 21:23 GMT
Take a look at HTMLEditorKit and it's read method. Here's a very simple
example that just dumps an HTML document.
You could also look at the HTMLDocument.getIterator(..) method for parsing
specific tags.

import javax.swing.text.html.*;
import java.io.*;
import java.net.*;
import javax.swing.text.*;
import java.util.*;

public class Sampler
{
public static void parseElement( Element elem, int offset )
{
 for( int y = 0; y < offset; y++ )
 {
  System.out.print("  ");
 }

 System.out.println(elem.getName());
 AttributeSet as = elem.getAttributes();
 Enumeration en = as.getAttributeNames();
 while( en.hasMoreElements())
 {
  for( int y = 0; y < offset; y++ )
  {
   System.out.print("  ");
  }
  System.out.println("<" + as.getAttribute(en.nextElement()) + ">");
 }

 for( int x = 0; x< elem.getElementCount(); x++ )
 {
  parseElement( elem.getElement(x), offset+1);
 }
}

 public static void main(String[] args)
 {
 try
 {
  HTMLEditorKit kit = new HTMLEditorKit();
  HTMLDocument doc = new HTMLDocument();
  URL pageontheweb = new URL("http://java.sun.com");
  InputStream is = new BufferedInputStream(pageontheweb.openStream());
  kit.read(is, doc, 0);
  Element[] elems = doc.getRootElements();
  for( int x = 0; x < elems.length; x++ )
  {
   System.out.println(elems[x].getClass().getName());
   parseElement( elems[x], 0 );
  }
 }
 catch(BadLocationException ex)
 {
 }
 catch(MalformedURLException ex)
 {
 }
 catch(IOException ex)
 {
 }
 }
}
> Hi:
>
[quoted text clipped - 14 lines]
> Thanks
> MT


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.