Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / February 2007

Tip: Looking for answers? Try searching our database.

How to parse XML which contains & in the text ?

Thread view: 
sohan.soni@gmail.com - 14 Feb 2007 11:31 GMT
Hi,

XML file content is:

<?xml version="1.0"?>

<!DOCTYPE RECORD SYSTEM ".\RECORD.dtd">

<RECORD EXT_SOURCE="DEFAULT" TEMPLATE="GAS_POOL_POINTS">

  <TABLE_NAME>GAS_POOL_POINTS</TABLE_NAME>

  <COLUMN>

     <COLUMN_NAME>GP_POOL</COLUMN_NAME>

     <PRIMARY_KEY>Y</PRIMARY_KEY>

     <COLUMN_VALUE>Some&Value</COLUMN_VALUE>

  </COLUMN>

</Record>

When Parsing (i.e. converting this XML doc to String) this XML file
using Java code, I am getting following exception.

org.xml.sax.SAXParseException: Next character must be ";" terminating
reference to entity "Value".

I think there is some changes/modification needed in DTD to treat the
string in XML which contains & as a literal, instead of expecting some
entity.

Adding to this, XML content is not under our control.

Please reply if somebody knows about this.
Daniel Dyer - 14 Feb 2007 11:39 GMT
> When Parsing (i.e. converting this XML doc to String) this XML file
> using Java code, I am getting following exception.
>
> org.xml.sax.SAXParseException: Next character must be ";" terminating
> reference to entity "Value".

Section 2.4 of the XML 1.0 specification:

"The ampersand character (&) and the left angle bracket (<) MUST NOT  
appear in their literal form, except when used as markup delimiters, or  
within a comment, a processing instruction, or a CDATA section. If they  
are needed elsewhere, they MUST be escaped using either numeric character  
references or the strings "&amp;" and "&lt;" respectively. The right angle  
bracket (>) may be represented using the string "&gt;", and MUST, for  
compatibility, be escaped using either "&gt;" or a character reference  
when it appears in the string "]]>" in content, when that string is not  
marking the end of a CDATA section."

> I think there is some changes/modification needed in DTD to treat the
> string in XML which contains & as a literal, instead of expecting some
> entity.

You can't fix this in the DTD, the XML is invalid and the parser is  
correct to reject it.

> Adding to this, XML content is not under our control.

Unforunately, the only rational fix *is* to change the XML.  Either use  
&amp; or wrap the element data in a CDATA section.  If the XML is  
controlled by a third part it would be reasonable to request that they  
change it since it is not really XML at all if it is not valid.

Dan.

Signature

Daniel Dyer
http://www.uncommons.org

sohan.soni@gmail.com - 18 Feb 2007 09:26 GMT
> On Wed, 14 Feb 2007 11:31:18 -0000,sohan.s...@gmail.com  
>
[quoted text clipped - 35 lines]
> --
> Daniel Dyerhttp://www.uncommons.org

Thanks Daniel,
That info really helped.

Regards
Sohan
Alex Hunsley - 15 Feb 2007 23:45 GMT
> Hi,
>
[quoted text clipped - 31 lines]
>
> Adding to this, XML content is not under our control.

Like the other replier said, it's invalid XML. It shouldn't contain a
'naked' ampersand like that.
Do you have any chance at all to speak to the producer of this XML? It's
very reasonable to ask them to fix it. If you can't ask them to fix it,
then how about:

1) put in a fix yourself - e.g. do a search and replace kludge on the
content before the XML parser gets it - so replace naked '&' with
'&amp;' (and any other nasty characters that crop up)
2) At least tell the party making the XML that it is broken - you may
help someone else down the line by doing this, if not yourself

lex
sohan.soni@gmail.com - 18 Feb 2007 09:26 GMT
> sohan.s...@gmail.com wrote:
> > Hi,
[quoted text clipped - 48 lines]
>
> - Show quoted text -

Thanks Lex,

Sohan


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.