Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / May 2005

Tip: Looking for answers? Try searching our database.

Detecting and using the encoding of an XML file

Thread view: 
Nomak - 17 May 2005 13:51 GMT
Hello,

i'm reading XML files (with Xerces SAX2). The thing is the strings are read as ASCII (8bits) instead of UTF-8 while UTF-8 is specified as the encoding of the XML file.

I googled a little bit but i didn't find THE way you must read strings from XML in java, so i'm asking.

Here is my base code:

    parserClassName = "org.apache.xerces.parsers.SAXParser";
...

       XMLReader reader = null;
       try {
           reader = XMLReaderFactory.createXMLReader(parserClassName);
       } catch (Exception ex) {
        ex.printStackTrace();
       }
       
       try {
           try {
               reader.setFeature("http://xml.org/sax/features/validation", true);
           } catch (SAXException ex) {
        ex.printStackTrace();
           }
           
           reader.setContentHandler(myContentHandler);
           reader.setErrorHandler(myErrorHandler);            
           InputSource inputSource = new InputSource(xmlURI);

           System.err.println("encoding = " + inputSource.getEncoding());
           System.err.println("public id = " + inputSource.getPublicId());
           System.err.println("system id = " + inputSource.getSystemId());
           
           reader.parse(inputSource);

           // String charsetName = reader...getCharset();
    }

what must i add/remove/modify to get my strings properly?

TIA
iksrazal@terra.com.br - 17 May 2005 17:13 GMT
Here's a utility class with some static methods I use for this:

package com.hostedtelecom.callcentreweb.util;

import java.io.*;
import java.util.*;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.dom.DOMSource;
import org.xml.sax.InputSource;

/**
* Utilty class for XML basic tasks
*/
public class XMLHelper
{
 /** Convert W3C XML Document to String.
 @param   document
 @return  String
 @throws  XMLHelperException
 */
 public static final String getDocumentAsString(Document document)
throws XMLHelperException
 {
   try
   {
     // Create source and result objects
     Source source = new DOMSource(document);
     StringWriter out = new StringWriter();
     Result result = new StreamResult(out);
     TransformerFactory tFactory = TransformerFactory.newInstance();
     Transformer transformer = tFactory.newTransformer();
     transformer.transform(source, result);
     return out.toString();
   }
   catch(Exception e)
   {
     throw new XMLHelperException("XML Document to String Error", e);
   }
 }

 /** Convert String to a W3C XML Document.
 @param   xmlString
 @return  Document
 @throws  XMLHelperException
 */
 public static final Document getDocument(String xmlString) throws
XMLHelperException
 {
   try
   {
     String nstr = null;
     //cannot have whitespace in the beginning of an xml document
     if (xmlString.charAt(0) != ' ')
     {
       nstr = removeInitialWS(xmlString);
     }
     else
     {
       nstr = xmlString;
     }

     DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
     factory.setNamespaceAware(true);
     factory.setIgnoringElementContentWhitespace(false);
     DocumentBuilder builder = factory.newDocumentBuilder();

     InputSource isXml = new InputSource (new StringReader(nstr));
     return builder.parse(isXml);
   }
   catch(Exception e)
   {
     throw new XMLHelperException("String to XML Document Error for
String:\n\n "+xmlString+" ", e);
   }
 }

 /**
   Remove any blank spaces in beginning of the XML declaration
 */
 public static final String removeInitialWS(String xmlString) throws
XMLHelperException
 {
   try
   {
     int pos = xmlString.indexOf("<");
     if (-1 == pos)
     {
       throw new Exception("Invalid XML, char '<' not found");
     }

     return xmlString.substring(pos);
   }
   catch(Exception e)
   {
     throw new XMLHelperException("String to XML Document Error for
String: \n\n "+xmlString+" ", e);
   }
 }

public static final String getNodeToString(Node node) throws
XMLHelperException
 {
   try
   {
     TransformerFactory tFactory = TransformerFactory.newInstance();
     Transformer transformer = tFactory.newTransformer();
     transformer.setOutputProperty("omit-xml-declaration", "yes");

     StringWriter sw = new StringWriter();
     StreamResult result = new StreamResult(sw);
     DOMSource source = new DOMSource( node );
     transformer.transform( source, result );

     return sw.getBuffer().toString();
    }
    catch(Exception e)
    {
      throw new XMLHelperException("XML Document to String Err", e);
    }
 }
}

HTH,
iksrazal
http://www.braziloutsource.com
Nomak - 18 May 2005 14:15 GMT
> [...]

and how do you use it?

i can't believe knowbody has think about that in the xerces team...


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.