Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / July 2007

Tip: Looking for answers? Try searching our database.

Problem reading " & apos;  " from XML using SAX Parser

Thread view: 
madan - 30 Jul 2007 12:30 GMT
Hi All,

I have a XML which contains the following element

<DataText>This is simple ' Text</DataText>

I have included " & apos ; " in the element called DataText.

When parsing the element, am getting only the text that appears before
& apos ;

When Not including " & apos ; " am able to get the full text from this
element.

I observed that in the method characters(char buf[], int offset, int
len)

Thelen attribute shows the total length from start position to the
position where " & apos ; "starts...

How can i get the whole text which includes even " & apos ; "

Thanks

Note : while posting this request," & apos; " is being formatted to "
'  " . thats the reason included space between them

Thanks
bugbear - 30 Jul 2007 16:55 GMT
> Hi All,
>
[quoted text clipped - 24 lines]
>
> Thanks

Are you getting multiple calls to your handler?
How many calls (leading question) do you expect
your handler to get?

   BugBear
Roedy Green - 30 Jul 2007 17:28 GMT
>I have included " & apos ; " in the element called DataText.

Just like HTML, various characters are reserved and have long forms
called entities to use when they occur accidentally in the text as
data: &amp;, &lt;, &gt;, &apos; and &quot; Unlike HTML, XML just has
those five basic entities. Character references take one of two forms:
decimal references, &#8478; and hexadecimal references, &#x211e. Named
character entities such as &eacute; don't work. You can use any
Unicode characters you want that are not part of the XML grammar, and
UTF-8 deals with encoding them.

If you meant the spaces in &_apos_;, it should be encoded &amp;_apos_;

If you did not mean the spaces, then it should be encoded: &amp;apos;
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Lew - 30 Jul 2007 21:02 GMT
>> I have included " & apos ; " in the element called DataText.
>
[quoted text clipped - 10 lines]
>
> If you did not mean the spaces, then it should be encoded: &amp;apos;

The OP had stated:
> Note : while posting this request," & apos; " is being formatted to "
> '  " . thats the reason included space between them

Which leads me to wonder what they were using to enter their post.  It was
plain text so there shouldn't have been an issue on the Usenet side.  Anyhow,
it's pretty clear the OP didn't intend for the spaces to be in the final
literal representation, thus they were saying "&apos;".

To make sure I understand Roedy's answer: to encode the element, instead of
saying "&apos;" the OP should say "&amp;apos;", correct?

Signature

Lew

Roedy Green - 31 Jul 2007 00:08 GMT
>To make sure I understand Roedy's answer: to encode the element, instead of
>saying "&apos;" the OP should say "&amp;apos;", correct?
&apos; is the encoding for ' when you mean it as a literal character,
not as a string delimiter.
&amp;apos; is the encoding for &apos; when it meant it as a literal
string of characters not and an encoding for '.

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Lasse Reichstein Nielsen - 31 Jul 2007 02:16 GMT
> I have a XML which contains the following element
>
[quoted text clipped - 4 lines]
> When parsing the element, am getting only the text that appears before
> & apos ;

HOW are you parsing it?
If using a DOM parser, you will likely find that the resulting
tree is an element node named DataText with three children:
a text node, an entity node and another text node.
If you are expecting only one child and only looking at the element
node's firstChild, you will find only the text before the entity.

Other parsers might also split the text into separate chunks.

/L
Signature

Lasse Reichstein Nielsen  -  lrn@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
 'Faith without judgement merely degrades the spirit divine.'

madan - 31 Jul 2007 06:06 GMT
Hi All,

As  Lasse Reichstein said, the Text between the element has been
called thrice if the '&apos;' is in middle of the Text.

Am using SAX Parser extending DefaultHandler.

The method characters(char buf[], int offset, int len) is being called
thrice as said above.

Temp'ly Resolved this by appending the text to a StringBuffer and
converting that to string when the element ends.

But is this the expected behavior of SAX Parser, that the parser might
split the text into separate chunks if there are some entities like
this in between the text ?

Madan N
Lew - 31 Jul 2007 12:03 GMT
> Hi All,
>
[quoted text clipped - 12 lines]
> split the text into separate chunks if there are some entities like
> this in between the text ?

Did you read bugbear's response to your original question?

Have you read the docs on the SAX callback methods?

Yes, it is the expected behavior, in the sense that you do not know how many
times the callback will be invoked to parse the text.  It might split on any
arbitrary location, not just on entities.

Signature

Lew



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.