Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / April 2006

Tip: Looking for answers? Try searching our database.

Notes/Domino HTML parsen

Thread view: 
VitaminB - 26 Apr 2006 10:54 GMT
Hello guys,

I want to parse a HTML Frameset (by Java) which is generated by a Lotus
Domino server. When I use a plain text HTML generated by Domino server,
my parser works fine. In contrast, then I try to parse the frameset I
get the following exception:

##########
Exception:
##########

java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:295)
    at java.io.DataInputStream.readUTF(DataInputStream.java:661)
    at conparse.main(conparse.java:30)
    at conparseTest.testMain(conparseTest.java:17)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:85)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:58)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:60)
    at java.lang.reflect.Method.invoke(Method.java:391)
    at junit.framework.TestCase.runTest(TestCase.java:154)
    at junit.framework.TestCase.runBare(TestCase.java:127)
    at junit.framework.TestResult$1.protect(TestResult.java:106)
    at junit.framework.TestResult.runProtected(TestResult.java:124)
    at junit.framework.TestResult.run(TestResult.java:109)
    at junit.framework.TestCase.run(TestCase.java:118)
    at junit.framework.TestSuite.runTest(TestSuite.java:208)
    at junit.framework.TestSuite.run(TestSuite.java:203)
    at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:478)
    at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:344)
    at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

###########
Code
###########

public class conparse{

    conparse(){

    }

    public void main(){

        try {

            URL urlobj = new URL("http://dev0004/Test/frame.htm");

            HttpURLConnection uc = null;
              uc = (HttpURLConnection)urlobj.openConnection();
              uc.setUseCaches(false);
            DataInputStream is = new DataInputStream(uc.getInputStream());
            String str =is.readUTF();
            System.out.println(str);

            HTMLEditorKit hKit = new HTMLEditorKit();
            HTMLDocument hDoc = new HTMLDocument();
            hKit.read(is, hDoc, 0);

            HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FONT);

            AttributeSet attSet = it.getAttributes();
            String s = (String)attSet.getAttribute(HTML.Attribute.COLOR);
            System.out.println(s);

            //System.out.println(attSet.getAttributeCount());

        }
       catch ( Exception e ) {
            e.printStackTrace();
        }

    }

}

################
Plain Text HTML:
################

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>

<script language="JavaScript" type="text/javascript">
<!--
document._domino_target = "_self";
function _doClick(v, o, t, h) {
 var form = document._ContentRetrival;
 if (form.onsubmit) {
    var retVal = form.onsubmit();
    if (typeof retVal == "boolean" && retVal == false)
      return false;
 }
 var target = document._domino_target;
 if (o.href != null) {
   if (o.target != null)
      target = o.target;
 } else {
   if (t != null)
     target = t;
 }
 form.target = target;
 form.__Click.value = v;
 if (h != null)
   form.action += h;
 form.submit();
 return false;
}
// -->
</script>
</head>
<body text="#000000" bgcolor="#FFFFFF">

<form method="post"
action="/Test/HET/PerformanceTestDB.nsf/ContentRetrival?OpenForm&amp;Seq=1"
name="_ContentRetrival">
<input type="hidden" name="__Click" value="0"><b>Test Page for Content
Retrival</b><br>
<br>
<br>
<font color="#FF0000">Hello, here is some text without a meaning. This
text should show, how a printed</font><br>
</form>
</body>
</html>

#######################################
Frameset HTML (received by Internet Explorer from the Domino Server)
#######################################
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN">
<html>
<head>

<script language="JavaScript" type="text/javascript">
<!--
self._domino_name = "_Main";
// -->
</script>
</head>

<frameset cols="45%,55%">

<frame
src="/Test/HET/PerformanceTestDB.nsf/ContentDeliveryMeasurement?OpenForm">

<frameset rows="1*,1*">

<frame src="/Test/HET/PerformanceTestDB.nsf/DocsInserted?OpenView">

<frame name="docPreviewFrame"
src="/Test/HET/PerformanceTestDB.nsf/select?OpenForm">
</frameset>
</frameset>
</html>

Regards,
Marcus
Venkatesh - 27 Apr 2006 04:32 GMT
Think readUTF() expects the length of string at the beginning (The
first 2 bytes are considered to contain the length of the rest of the
string), and that might be causing the problem for u. The best way to
read text is to just loop and read each line, until End of file is
reached.

U can see the source of DataInputStream to get more details

-Venkatesh
Chris Uppal - 27 Apr 2006 13:14 GMT
> DataInputStream is = new DataInputStream(uc.getInputStream());
> String str =is.readUTF();

Why are you using a DataInputStream ?  And, more specifically, why are you
using readUTF8() ?  See the documentation for the method, and you'll see why it
is totally worthless for reading text downloaded from the Net.

   -- chris
Roedy Green - 27 Apr 2006 19:47 GMT
On Thu, 27 Apr 2006 13:14:08 +0100, "Chris Uppal"
<chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly
quoted someone who said :

>Why are you using a DataInputStream ?  And, more specifically, why are you
>using readUTF8() ?  See the documentation for the method, and you'll see why it
>is totally worthless for reading text downloaded from the Net.

see http://mindprod.com/applets/fileio.html
for to generate code that will work.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.