Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2006

Tip: Looking for answers? Try searching our database.

SAX Parser problem

Thread view: 
Mize-ze - 13 Nov 2006 20:09 GMT
I am using SAX to parse an XML file.
I want to get the "characters" of a specific tag (element)

Right now I extend the DefaultHandler and override public void
characters(char[] ch, int start, int length) method. but this event is
raised whenever there is content in a tag.
How can I get a specific charcters from an element using SAX

I don't have access to the qName from this event.
Any ideas?

Thanks.
Arne Vajhøj - 14 Nov 2006 01:49 GMT
> I am using SAX to parse an XML file.
> I want to get the "characters" of a specific tag (element)
[quoted text clipped - 5 lines]
>
> I don't have access to the qName from this event.

Override:

   public void startElement(
      String namespaceURI,
      String localName,
      String rawName,
      Attributes atts)
      throws SAXException {

Arne
Mize-ze - 16 Nov 2006 08:14 GMT
Arne Vajh?j wrote:
> > I am using SAX to parse an XML file.
> > I want to get the "characters" of a specific tag (element)
[quoted text clipped - 16 lines]
>
> Arne

But where will I have access to the "characters"? (not to the atts)

<ELEMENT>charaters: this is what I want!!</ELEMENT>

thanks
Ian Wilson - 16 Nov 2006 10:15 GMT
>>>I am using SAX to parse an XML file.
>>>I want to get the "characters" of a specific tag (element)
[quoted text clipped - 20 lines]
>
> <ELEMENT>charaters: this is what I want!!</ELEMENT>

Here's a simple approach which I've used*:

In startElement(), store the localName (or qName). For example you could
store it in an instance variable (i.e. a field) such as String
currentElementName.

In characters() retrieve the stored localName (or qName). You then have
both tagname ("ELEMENT") and content ("charaters: this is what I
want!!") together in one place.

If necessary, you could nullify the stored localName (or qName) in
endElement().

* Actually I store a structure that represents all the elements leading
to a particular leaf in the XML tree

e.g. for
                         currentElement
<foo>                    foo   
  <bar>                  foo.bar
    <baz>XXX</baz>       foo.bar.baz
  </bar>
</foo>
Donald Roby - 16 Nov 2006 10:48 GMT
> Here's a simple approach which I've used*:
>
> In startElement(), store the localName (or qName). For example you could
> store it in an instance variable (i.e. a field) such as String
> currentElementName.

In startElement(), also initialize a StringBuffer to collect the
characters into.

> In characters() retrieve the stored localName (or qName). You then have
> both tagname ("ELEMENT") and content ("charaters: this is what I
> want!!") together in one place.

You don't get them all at once necessarily.  Collect them into the
above-mentioned StringBuffer in the characters() method for use elsewhere.

> If necessary, you could nullify the stored localName (or qName) in
> endElement().

In endElement(), convert the StringBuffer to a String and at this point,
you do have both the tag and the entire character contents.

At this point, I create whatever internal structure it is I'm building,
usually by a call to a separate builder that had been passed in via the
handler's constructor, using the tag and the extracted contents, and
then clear them out to be ready for the next one parsed.
Ian Wilson - 16 Nov 2006 15:15 GMT
>> Here's a simple approach which I've used*:
>>
[quoted text clipped - 11 lines]
> You don't get them all at once necessarily.  Collect them into the
> above-mentioned StringBuffer in the characters() method for use elsewhere.

Thanks for pointing that out!

On re-rereading the javadocs for DefaultHandler I now see that it refers
to "each chunk of character data", which is a clue I overlooked.

I'm not sure if my testing has been lucky or my XML is sufficiently
simple that the first "chunk" will always contain the whole character
data for that element.

Do you know of a simple XML example that illustrates character()
providing several chunks? Or is it some relatively unpredictable
buffering related phenomenon?

>> If necessary, you could nullify the stored localName (or qName) in
>> endElement().
>>
> In endElement(), convert the StringBuffer to a String and at this point,
> you do have both the tag and the entire character contents.

Noted :-)
Ian Wilson - 16 Nov 2006 15:30 GMT
>>> In characters() retrieve the stored localName (or qName). You then
>>> have both tagname ("ELEMENT") and content ("charaters: this is what I
[quoted text clipped - 7 lines]
> providing several chunks? Or is it some relatively unpredictable
> buffering related phenomenon?

It seems to happen if the character data contains newlines.

<inventory>
  <animal type="mammal">
    <name>Fred</name>
    <species>Hippo</species>
    <weight units="Kg">1552</weight>
  </animal>
  <animal type="reptile">
    <name>
       Gert
       AKA Gertrude
       the galloping reptile
    </name>
    <species>Croc</species>
  </animal>
</inventory>

I find character() is called separately for "Gert", "AKA Gertrude" and
"the galloping reptile".

My XML data has no newlines within character data, so I didn't have a
problem. Nevertheless I have made the necessary changes just in case :-)
Arne Vajhøj - 17 Nov 2006 01:08 GMT
>>> I am using SAX to parse an XML file.
>>> I want to get the "characters" of a specific tag (element)
[quoted text clipped - 17 lines]
>
> But where will I have access to the "characters"? (not to the atts)

You find the tag with startElement and the text inside with characters.

Arne
vahan - 17 Nov 2006 11:47 GMT
In handle class:

        String localName =null;

       public void startElement(String uri, String localName,
                                String qName, Attributes attributes)
throws
               SAXException {

           this.localName = localName;
           }
       }

       public void endElement(String uri,
                              String localName,
                              String qName) throws SAXException {
           this.localName = null;

       }

       public void characters(char ch[], int start, int length) throws
               SAXException {
           if ("YourTagName".equalsIgnoreCase(localName)) {
                String desiredContext  =new String(ch, start,
length));
           }
       }

> >>> I am using SAX to parse an XML file.
> >>> I want to get the "characters" of a specific tag (element)
[quoted text clipped - 21 lines]
>
> Arne
Ian Wilson - 17 Nov 2006 14:45 GMT
<top-posted example code snipped>

You're making the same mistake I did, see earlier in thread.

For one element, character() may be called several times providing
character data in several chunks per element.
Arne Vajhøj - 18 Nov 2006 02:11 GMT
> You're making the same mistake I did, see earlier in thread.
>
> For one element, character() may be called several times providing
> character data in several chunks per element.

Having character accumulate in a StringBuffer combined with some logic
in startElement and endElement is rather standard.

Arne
Ian Wilson - 20 Nov 2006 10:36 GMT
>> You're making the same mistake I did, see earlier in thread.
>>
[quoted text clipped - 3 lines]
> Having character accumulate in a StringBuffer combined with some logic
> in startElement and endElement is rather standard.

I guess you mean standard as in "a customary programming idiom amongst
experienced users of SAX" rather than standard as in "explicitly written
down somewhere authoritative where people might be expected to easily
find it"?

I didn't find this idiom in the javadoc for DefaultHandler or in the
Java books I have (which admittedly only cover SAX briefly).

http://www.saxproject.org/quickstart.html doesn't describe this
programming idiom either.

When I Googled for "Java SAX example", the first three examples didn't
show this idiom, however the fourth did
(http://www.cafeconleche.org/slides/sd2002west/introxml/265.html)

This wasn't intended to be a whinge, I'm just pointing out that the
"standard" idiom may not be immediately obvious to people new to SAX.
Chris Uppal - 20 Nov 2006 12:47 GMT
> This wasn't intended to be a whinge, I'm just pointing out that the
> "standard" idiom may not be immediately obvious to people new to SAX.

This is example of an API design which might almost have been designed to be
misunderstood.

Other examples are from java.io.InputStream (and friends) where the uselessness
of available() and the not-totally-obvious semantics of read() seem to evade a
good many programmers.

   -- chris


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.