Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2006

Tip: Looking for answers? Try searching our database.

XML problem

Thread view: 
gk - 20 Nov 2006 16:12 GMT
XML
======

<?xml version="1.0" encoding='UTF-8'?>
<sample>

data before element
 <element attr1="value1" attr2="value2">
   <data1 a="v">some data &amp; one attribute</data1>
   <data2>CDATA follows <![CDATA[more data]]></data2>
 </element>
 data after element
 <?proc data for processing?>
</sample>

Output
========

> java FirstSample first.xml
startDocument
startElement:
characters: data before element
startElement:
 attribute: ="value1"
 attribute: ="value2"
startElement:
 attribute: ="v"
characters: some data
characters: &
characters: one attribute
endElement
startElement:
characters: CDATA follows
characters: more data
endElement
endElement
characters: data after element
processingInstruction: proc
 data: data for processing
endElement
endDocument

this is a SAX parser.

what i dont understand here is , in above why the characters are

characters: some data
characters: &
characters: one attribute

WHY NOT  characters : some data & one attribute.

why there are 3 lines for it ?

how do i know under which character , he characters will be broken and
will make 3 lines as above ?
Thomas Fritsch - 20 Nov 2006 17:14 GMT
> XML
[...]
>     <data1 a="v">some data &amp; one attribute</data1>
[...]

> Output
[...]
> startElement:
>   attribute: ="v"
> characters: some data
> characters: &
> characters: one attribute
> endElement
[...]

> what i dont understand here is , in above why the characters are
>
[quoted text clipped - 5 lines]
>
> why there are 3 lines for it ?
Because it is easier for the parser.

> how do i know under which character , the characters will be broken and
> will make 3 lines as above ?

You can't know. :-(
The parsers are free to do it as they like. That means the parser may do
it in a way that is easiest for *him*, but not easiest for *you*.
The justification is in the API docs of ContentHandler.chars() at
<http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentHandler.html#characte
rs(char[],%20int,%20int
)>
<QUOTE>
SAX parsers may return all contiguous character data in a single chunk,
or they may split it into several chunks;
</QUOTE>
So you have to cope with possibly multiple chunks of characters (i.e.
reassemble them somehow).

Signature

Thomas

gk - 21 Nov 2006 04:10 GMT
> > XML
> [...]
[quoted text clipped - 39 lines]
> --
> Thomas

nicely spoken.

but the parser is running some algo ...right ? or it is really randomly
breaking chars !
the parser might be abiding some rules or algo to do this taks ....is
not it ?

may be parser has this algo ...

If parser finds "amp'   the break chars
if parser finds ";"  the break chars

some sort of like this ....

or it is  a whimsical parser !
Ian Wilson - 21 Nov 2006 11:27 GMT
>>> XML
>>
[quoted text clipped - 50 lines]
>
> or it is  a whimsical parser !

Code as if it is whimsical and capricious and all will be well.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.