Java XML Tutorial

How to read XML file in Java (SAX Parser)

This tutorial will show you how to use the Java built-in SAX parser to read and parse an XML file.

1. What is Simple API for XML (SAX)

1.1 The Simple API for XML (SAX) is a push API, an observer pattern, event-driven, serial access the XML file elements sequentially. This SAX parser reads the XML file from start to end, calls one method when it encountered one element, or calls a different method when it found specific text or attribute.

The SAX is fast and efficient, requires much less memory than DOM, because SAX does not create an internal representation (tree structure) of the XML data, as a DOM does.

Note
SAX Parser is faster and uses less memory than DOM parser. SAX is suitable for reading the XML elements sequentially; DOM is suitable for XML manipulation like create, modify or delete the XML elements.

1.2 Some common SAX events :

  • startDocument() and endDocument() – Method called at the start and end of an XML document.
  • startElement() and endElement() – Method called at the start and end of a XML element.
  • characters() – Method called with the text contents in between the start and end of an XML element.

1.3 Below is a simple XML file.


    <name>mkyong</name>

The SAX parser read the above XML file and calls the following events or methods sequentially:

  1. startDocument()
  2. startElement()<name>
  3. characters()mkyong
  4. endElement()</name>
  5. endDocument()

2. Read or Parse a XML file (SAX)

This example shows you how to use the Java built-in SAX parser APIs to read or parse an XML file.

2.1 Below is an XML file.

src/main/resources/staff.xml

<?xml version="1.0" encoding="utf-8"?>
<Company>
    <staff id="1001">
        <name>mkyong</name>
        <role>support</role>
        <salary currency="USD">5000</salary>
        <!-- for special characters like < &, need CDATA -->
        <bio><![CDATA[HTML tag <code>testing</code>]]></bio>
    </staff>
    <staff id="1002">
        <name>yflow</name>
        <role>admin</role>
        <salary currency="EUR">8000</salary>
        <bio><![CDATA[a & b]]></bio>
    </staff>
</Company>

P.S In the XML file, for those special characters like < or &, we need to wrap it with CDATA.

2.2 Create a class to extend org.xml.sax.helpers.DefaultHandler, and override the startElement, endElement and characters methods to print all the XML elements, attributes, comments and texts.

PrintAllHandlerSax.java

package com.mkyong.xml.sax.handler;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class PrintAllHandlerSax extends DefaultHandler {

  private StringBuilder currentValue = new StringBuilder();

  @Override
  public void startDocument() {
      System.out.println("Start Document");
  }

  @Override
  public void endDocument() {
      System.out.println("End Document");
  }

  @Override
  public void startElement(
          String uri,
          String localName,
          String qName,
          Attributes attributes) {

      // reset the tag value
      currentValue.setLength(0);

      System.out.printf("Start Element : %s%n", qName);

      if (qName.equalsIgnoreCase("staff")) {
          // get tag's attribute by name
          String id = attributes.getValue("id");
          System.out.printf("Staff id : %s%n", id);
      }

      if (qName.equalsIgnoreCase("salary")) {
          // get tag's attribute by index, 0 = first attribute
          String currency = attributes.getValue(0);
          System.out.printf("Currency :%s%n", currency);
      }

  }

  @Override
  public void endElement(String uri,
                         String localName,
                         String qName) {

      System.out.printf("End Element : %s%n", qName);

      if (qName.equalsIgnoreCase("name")) {
          System.out.printf("Name : %s%n", currentValue.toString());
      }

      if (qName.equalsIgnoreCase("role")) {
          System.out.printf("Role : %s%n", currentValue.toString());
      }

      if (qName.equalsIgnoreCase("salary")) {
          System.out.printf("Salary : %s%n", currentValue.toString());
      }

      if (qName.equalsIgnoreCase("bio")) {
          System.out.printf("Bio : %s%n", currentValue.toString());
      }

  }

  // http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#characters%28char%5B%5D,%20int,%20int%29
  // SAX parsers may return all contiguous character data in a single chunk,
  // or they may split it into several chunks
  @Override
  public void characters(char ch[], int start, int length) {

      // The characters() method can be called multiple times for a single text node.
      // Some values may missing if assign to a new string

      // avoid doing this
      // value = new String(ch, start, length);

      // better append it, works for single or multiple calls
      currentValue.append(ch, start, length);

  }

}

2.3 SAXParser to parse an XML file.

ReadXmlSaxParser.java

package com.mkyong.xml.sax;

import com.mkyong.xml.sax.handler.PrintAllHandlerSax;
import org.xml.sax.SAXException;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;

public class ReadXmlSaxParser {

    private static final String FILENAME = "src/main/resources/staff.xml";

    public static void main(String[] args) {

        SAXParserFactory factory = SAXParserFactory.newInstance();

        try {

            // XXE attack, see https://rules.sonarsource.com/java/RSPEC-2755
            SAXParser saxParser = factory.newSAXParser();

            PrintAllHandlerSax handler = new PrintAllHandlerSax();

            saxParser.parse(FILENAME, handler);

        } catch (ParserConfigurationException | SAXException | IOException e) {
            e.printStackTrace();
        }

    }

}

Output

Terminal

Start Document
Start Element : Company
Start Element : staff
Staff id : 1001
Start Element : name
End Element : name
Name : mkyong
Start Element : role
End Element : role
Role : support
Start Element : salary
Currency :USD
End Element : salary
Salary : 5000
Start Element : bio
End Element : bio
Bio : HTML tag <code>testing</code>
End Element : staff
Start Element : staff
Staff id : 1002
Start Element : name
End Element : name
Name : yflow
Start Element : role
End Element : role
Role : admin
Start Element : salary
Currency :EUR
End Element : salary
Salary : 8000
Start Element : bio
End Element : bio
Bio : a & b
End Element : staff
End Element : Company
End Document

2.4 The default SAX Parser will cause XXE attack or CWE-611, read this article to prevent XXE attack in SAX parser.


  SAXParserFactory factory = SAXParserFactory.newInstance();

  try {

      // https://rules.sonarsource.com/java/RSPEC-2755
      // prevent XXE, completely disable DOCTYPE declaration:
      factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

      SAXParser saxParser = factory.newSAXParser();

      PrintAllHandlerSax handler = new PrintAllHandlerSax();

      saxParser.parse(FILENAME, handler);

  } catch (ParserConfigurationException | SAXException | IOException e) {
      e.printStackTrace();
  }

3. Convert an XML file to an object

This example parses an XML file and converts it into a List of objects. It works, but not recommended, try JAXB

3.1 Review the same XML file.

src/main/resources/staff.xml

<?xml version="1.0" encoding="utf-8"?>
<Company>
    <staff id="1001">
        <name>mkyong</name>
        <role>support</role>
        <salary currency="USD">5000</salary>
        <!-- for special characters like < &, need CDATA -->
        <bio><![CDATA[HTML tag <code>testing</code>]]></bio>
    </staff>
    <staff id="1002">
        <name>yflow</name>
        <role>admin</role>
        <salary currency="EUR">8000</salary>
        <bio><![CDATA[a & b]]></bio>
    </staff>
</Company>

3.2 And we want to convert the above XML file into the following Staff object.

Staff.java

package com.mkyong.xml.sax.model;

import java.math.BigDecimal;

public class Staff {

  private Long id;
  private String name;
  private String role;
  private BigDecimal salary;
  private String Currency;
  private String bio;

  //... getters, setters...toString
}

3.3 The below class will do the XML to Object conversion.

MapStaffObjectHandlerSax.java

package com.mkyong.xml.sax.handler;

import com.mkyong.xml.sax.model.Staff;
import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

import java.math.BigDecimal;
import java.util.ArrayList;
import java.util.List;

public class MapStaffObjectHandlerSax extends DefaultHandler {

    private StringBuilder currentValue = new StringBuilder();
    List<Staff> result;
    Staff currentStaff;

    public List<Staff> getResult() {
        return result;
    }

    @Override
    public void startDocument() {
        result = new ArrayList<>();
    }

    @Override
    public void startElement(
            String uri,
            String localName,
            String qName,
            Attributes attributes) {

        // reset the tag value
        currentValue.setLength(0);

        // start of loop
        if (qName.equalsIgnoreCase("staff")) {

            // new staff
            currentStaff = new Staff();

            // staff id
            String id = attributes.getValue("id");
            currentStaff.setId(Long.valueOf(id));
        }

        if (qName.equalsIgnoreCase("salary")) {
            // salary currency
            String currency = attributes.getValue("currency");
            currentStaff.setCurrency(currency);
        }

    }

    public void endElement(String uri,
                           String localName,
                           String qName) {

        if (qName.equalsIgnoreCase("name")) {
            currentStaff.setName(currentValue.toString());
        }

        if (qName.equalsIgnoreCase("role")) {
            currentStaff.setRole(currentValue.toString());
        }

        if (qName.equalsIgnoreCase("salary")) {
            currentStaff.setSalary(new BigDecimal(currentValue.toString()));
        }

        if (qName.equalsIgnoreCase("bio")) {
            currentStaff.setBio(currentValue.toString());
        }

        // end of loop
        if (qName.equalsIgnoreCase("staff")) {
            result.add(currentStaff);
        }

    }

    public void characters(char ch[], int start, int length) {
        currentValue.append(ch, start, length);

    }

}

3.4 Run it.

ReadXmlSaxParser2.java

package com.mkyong.xml.sax;

import com.mkyong.xml.sax.handler.MapStaffObjectHandlerSax;
import com.mkyong.xml.sax.model.Staff;
import org.xml.sax.SAXException;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.io.InputStream;
import java.util.List;

public class ReadXmlSaxParser2 {

    public static void main(String[] args) {

        SAXParserFactory factory = SAXParserFactory.newInstance();

        try (InputStream is = getXMLFileAsStream()) {

            SAXParser saxParser = factory.newSAXParser();

            // parse XML and map to object, it works, but not recommend, try JAXB
            MapStaffObjectHandlerSax handler = new MapStaffObjectHandlerSax();

            saxParser.parse(is, handler);

            // print all
            List<Staff> result = handler.getResult();
            result.forEach(System.out::println);

        } catch (ParserConfigurationException | SAXException | IOException e) {
            e.printStackTrace();
        }

    }

    // get XML file from resources folder.
    private static InputStream getXMLFileAsStream() {
        return ReadXmlSaxParser2.class.getClassLoader().getResourceAsStream("staff.xml");
    }

}

Output

Terminal

Staff{id=1001, name='揚木金', role='support', salary=5000, Currency='USD', bio='HTML tag <code>testing</code>'}
Staff{id=1002, name='yflow', role='admin', salary=8000, Currency='EUR', bio='a & b'}

4. SAX Error Handler

This example shows how to register a custom error handler for the SAX parser.

4.1 Create a class and extends org.xml.sax.ErrorHandler. Read the code for self-explanation. It just wrapped the originate error message.

CustomErrorHandlerSax.java

package com.mkyong.xml.sax.handler;

import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

import java.io.PrintStream;

public class CustomErrorHandlerSax implements ErrorHandler {

    private PrintStream out;

    public CustomErrorHandlerSax(PrintStream out) {
        this.out = out;
    }

    private String getParseExceptionInfo(SAXParseException spe) {
        String systemId = spe.getSystemId();

        if (systemId == null) {
            systemId = "null";
        }

        String info = "URI=" + systemId + " Line="
                + spe.getLineNumber() + ": " + spe.getMessage();

        return info;
    }

    public void warning(SAXParseException spe) throws SAXException {
        out.println("Warning: " + getParseExceptionInfo(spe));
    }

    public void error(SAXParseException spe) throws SAXException {
        String message = "Error: " + getParseExceptionInfo(spe);
        throw new SAXException(message);
    }

    public void fatalError(SAXParseException spe) throws SAXException {
        String message = "Fatal Error: " + getParseExceptionInfo(spe);
        throw new SAXException(message);
    }

}

4.2 We use saxParser.getXMLReader() to get a org.xml.sax.XMLReader, it provide more options to configure the SAX parser.

ReadXmlSaxParser3.java

package com.mkyong.xml.sax;

import com.mkyong.xml.sax.handler.CustomErrorHandlerSax;
import com.mkyong.xml.sax.handler.MapStaffObjectHandlerSax;
import com.mkyong.xml.sax.model.Staff;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.io.InputStream;
import java.util.List;

public class ReadXmlSaxParser3 {

  public static void main(String[] args) {

      SAXParserFactory factory = SAXParserFactory.newInstance();

      try (InputStream is = getXMLFileAsStream()) {

          SAXParser saxParser = factory.newSAXParser();

          // parse XML and map to object, it works, but not recommend, try JAXB
          MapStaffObjectHandlerSax handler = new MapStaffObjectHandlerSax();

          // try XMLReader
          //saxParser.parse(is, handler);

          // more options for configuration
          XMLReader xmlReader = saxParser.getXMLReader();

          // set our custom error handler
          xmlReader.setErrorHandler(new CustomErrorHandlerSax(System.err));

          xmlReader.setContentHandler(handler);

          InputSource source = new InputSource(is);

          xmlReader.parse(source);

          // print all
          List<Staff> result = handler.getResult();
          result.forEach(System.out::println);

      } catch (ParserConfigurationException | SAXException | IOException e) {
          e.printStackTrace();
      }

  }

  // get XML file from resources folder.
  private static InputStream getXMLFileAsStream() {
      return ReadXmlSaxParser2.class.getClassLoader().getResourceAsStream("staff.xml");
  }

}

4.3 Update the staff.xml, remove the CDATA in the bio element, and put a &, and the SAX parser will hit an error.

src/main/resources/staff.xml

<?xml version="1.0" encoding="utf-8"?>
<Company>
    <staff id="1001">
        <name>mkyong</name>
        <role>support</role>
        <salary currency="USD">5000</salary>
        <!-- for special characters like < &, need CDATA -->
        <bio>&</bio>
    </staff>
</Company>

4.4 Run it with the above custom error handler.


  xmlReader.setErrorHandler(new CustomErrorHandlerSax(System.err));

Output

Terminal

org.xml.sax.SAXException: Fatal Error: URI=null Line=8: The entity name must immediately follow the '&' in the entity reference.
at com.mkyong.xml.sax.handler.CustomErrorHandlerSax.fatalError(CustomErrorHandlerSax.java:41)
at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:181)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1471)
//...

4.5 Run it without a custom error handler.


  // xmlReader.setErrorHandler(new CustomErrorHandlerSax(System.err));

Output

Terminal

[Fatal Error] :8:15: The entity name must immediately follow the '&' in the entity reference.
org.xml.sax.SAXParseException; lineNumber: 8; columnNumber: 15; The entity name must immediately follow the '&' in the entity reference.
	at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1243)
	at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
	at com.mkyong.xml.sax.ReadXmlSaxParser2.main(ReadXmlSaxParser2.java:44)  

5. SAX and Unicode

For XML files containing Unicode characters, by default, SAX can follow the XML encoding (default UTF-8) and parse the content correctly.

5.1 We can define the encoding at the top of the XML file, encoding="encoding-code"; for example, below is an XML file using the UTF-8 encoding.


<?xml version="1.0" encoding="utf-8"?>
<Company>
    <staff id="1001">
        <name>揚木金</name>
        <role>support</role>
        <salary currency="USD">5000</salary>
        <bio><![CDATA[HTML tag <code>testing</code>]]></bio>
    </staff>
    <staff id="1002">
        <name>yflow</name>
        <role>admin</role>
        <salary currency="EUR">8000</salary>
        <bio><![CDATA[a & b]]></bio>
    </staff>
</Company>

5.2 Alternatively, we can define a specified encoding in the InputSource.


  XMLReader xmlReader = saxParser.getXMLReader();
  xmlReader.setContentHandler(handler);

  InputSource source = new InputSource(is);

  // set encoding
  source.setEncoding(StandardCharsets.UTF_8.toString());

  //source.setEncoding(StandardCharsets.UTF_16.toString());

  xmlReader.parse(source);

Note
More SAX parser examples – Oracle – Simple API for XML (SAX)

6. Download Source Code

$ git clone https://github.com/mkyong/core-java

$ cd java-xml

$ cd src/main/java/com/mkyong/xml/sax/

7. References

About Author

author image
Founder of Mkyong.com, love Java and open source stuff. Follow him on Twitter. If you like my tutorials, consider make a donation to these charities.

Comments

Subscribe
Notify of
13 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
mahendra nadh
13 years ago

can u explain with an example what is the difference between SAX Parser and DOM Parser?

zenon
12 years ago
Reply to  mahendra nadh

DOM – reads all structure into memory, and data stays in memory, and next you can read your data from memory, and make a lot of operations such as search. (useful for small files)

SAX – nothing is stored in memory, so you can’t restore any data by later operations. file is parsed once, and you must catch data while it is parsed (useful for large files)

Karthi
14 years ago

while i try to compile the sax parser code i got the following error:

org.xml.sax.SAXParseException: Content is not allowed in prolog.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at com.mypack.ReadXMLFileSAX.main(ReadXMLFileSAX.java:99)

Andre Santos
9 years ago

yes!! Works perfectly

hendi santika
11 years ago

I’ve tried the code, but i have a problem here.
The result is :
End Element :firstname
End Element :lastname
End Element :nickname
End Element :salary
End Element :staff
End Element :firstname
End Element :lastname
End Element :nickname
End Element :salary
End Element :staff
End Element :company

Why does it happen ???

Thanks

Sim
10 years ago
Reply to  hendi santika

Solution (found at: http://stackoverflow.com/questions/6301678/java-sax-program-doesnt-go-to-startelement-method):

Check the import statement for the Attribute parameter, it should be:
import org.xml.sax.Attributes;

Sim
10 years ago
Reply to  hendi santika

Same problem here too 🙁

stefan2k
11 years ago

Thank you very much! Your tutorials have helped me several times.

seyma
13 years ago

I tried to read xml file ( approx. 600 MB ) on 3.2GB RAM computer and i got outofmemory exception with XOM, VTD-XML ex. Only this code makes it successfully. Thank you

PUCH
13 years ago

I think that this is the best tutorial about SAX and XML!

🙂