>>>I am using SAX to parse an XML file.
>>>I want to get the "characters" of a specific tag (element)
[quoted text clipped - 20 lines]
>
> <ELEMENT>charaters: this is what I want!!</ELEMENT>
> Here's a simple approach which I've used*:
>
> In startElement(), store the localName (or qName). For example you could
> store it in an instance variable (i.e. a field) such as String
> currentElementName.
In startElement(), also initialize a StringBuffer to collect the
characters into.
> In characters() retrieve the stored localName (or qName). You then have
> both tagname ("ELEMENT") and content ("charaters: this is what I
> want!!") together in one place.
You don't get them all at once necessarily. Collect them into the
above-mentioned StringBuffer in the characters() method for use elsewhere.
> If necessary, you could nullify the stored localName (or qName) in
> endElement().
In endElement(), convert the StringBuffer to a String and at this point,
you do have both the tag and the entire character contents.
At this point, I create whatever internal structure it is I'm building,
usually by a call to a separate builder that had been passed in via the
handler's constructor, using the tag and the extracted contents, and
then clear them out to be ready for the next one parsed.
Ian Wilson - 16 Nov 2006 15:15 GMT
>> Here's a simple approach which I've used*:
>>
[quoted text clipped - 11 lines]
> You don't get them all at once necessarily. Collect them into the
> above-mentioned StringBuffer in the characters() method for use elsewhere.
Thanks for pointing that out!
On re-rereading the javadocs for DefaultHandler I now see that it refers
to "each chunk of character data", which is a clue I overlooked.
I'm not sure if my testing has been lucky or my XML is sufficiently
simple that the first "chunk" will always contain the whole character
data for that element.
Do you know of a simple XML example that illustrates character()
providing several chunks? Or is it some relatively unpredictable
buffering related phenomenon?
>> If necessary, you could nullify the stored localName (or qName) in
>> endElement().
>>
> In endElement(), convert the StringBuffer to a String and at this point,
> you do have both the tag and the entire character contents.
Noted :-)
Ian Wilson - 16 Nov 2006 15:30 GMT
>>> In characters() retrieve the stored localName (or qName). You then
>>> have both tagname ("ELEMENT") and content ("charaters: this is what I
[quoted text clipped - 7 lines]
> providing several chunks? Or is it some relatively unpredictable
> buffering related phenomenon?
It seems to happen if the character data contains newlines.
<inventory>
<animal type="mammal">
<name>Fred</name>
<species>Hippo</species>
<weight units="Kg">1552</weight>
</animal>
<animal type="reptile">
<name>
Gert
AKA Gertrude
the galloping reptile
</name>
<species>Croc</species>
</animal>
</inventory>
I find character() is called separately for "Gert", "AKA Gertrude" and
"the galloping reptile".
My XML data has no newlines within character data, so I didn't have a
problem. Nevertheless I have made the necessary changes just in case :-)