Java XML Tutorial

How to read XML file in Java (StAX Parser)

This tutorial shows how to use the Streaming API for XML (StAX) parser to read or parse an XML document.

Table of contents

Note
P.S The Streaming API for XML (StAX) API is available since Java 1.6, a built-in JDK XML library.

P.S All below examples are tested with Java 11.

1. What is StAX

The StAX stands for Streaming API for XML (StAX), a pull API to work with the XML document.

There are two programming models for working with the XML document, streaming and the document object model (DOM). For DOM models, we can use DOM parser; For streaming model, we can use SAX parser or StAX parser.

1.1 Difference between SAX and StAX?

The Simple API for XML (SAX) is a push API; this SAX parser sends (push) the XML data to the client continuously. In SAX, the client has no control of when to receive the XML data.

For example, we register a custom DefaultHandler implementation to process the XML data sent by the SAX parser. Read the complete SAX example.


  // SAX
  SAXParser saxParser = factory.newSAXParser();

  // DefaultHandler implementation
  PrintAllHandlerSax handler = new PrintAllHandlerSax();

  saxParser.parse(FILENAME, handler);

The Streaming API for XML (StAX) is a pull API; the client calls methods on the StAX parser library to get (pull) the XML data one by one manually. In StAX, the client in control of when to get (pull) the XML data.


  // StAX Iterator API examples
  // next event
  XMLEvent event = xmlEventReader.nextEvent();

  // moves to next event
  event = xmlEventReader.nextEvent();

  // moves to next event
  event = xmlEventReader.nextEvent();

Further Reading

2. StAX Cursor API and Iterator API

The StAX contains two API sets: a cursor API and an iterator API.

2.1 StAX Cursor API

The StAX Cursor API contains two main interfaces XMLStreamReader and XMLStreamWriter. The XMLStreamReader.getEventType() will return a int and we need to map the event type manually.


  // StAX Cursor API
  XMLStreamReader reader = xmlInputFactory.createXMLStreamReader(
      new FileInputStream(path.toFile()));

  // this is int! we need to map the eventType manually
  int eventType = reader.getEventType();

  while (reader.hasNext()) {

      eventType = reader.next();

      if (eventType == XMLEvent.START_ELEMENT) {
      }
      //...
  }

2.2 StAX Iterator API

The StAX Iterator API contains two main interfaces XMLEventReader and XMLEventWriter, and we work with the XMLEvent.


  // StAX Iterator API
  XMLEventReader reader = xmlInputFactory.createXMLEventReader(
      new FileInputStream(path.toFile()));

  // event iterator
  while (reader.hasNext()) {

      XMLEvent event = reader.nextEvent();

      if (event.isStartElement()) {
      }
      //...
  }

2.3 which one? Cursor or Iterator APIs?

  • The Cursor API makes smaller and efficient code, also better performance compare to Iterator API. Suitable for high-performance applications or mobile apps.
  • The Iterator API provides XML events, which are more flexible, extensible, and easy to code with, suitable for enterprise applications.

Further Reading

3. A XML file

Below is an XML document, later we use the StAX parser to read the XML data and print it out.

src/main/resources/staff.xml

<?xml version="1.0" encoding="utf-8"?>
<Company>
    <staff id="1001">
        <name>mkyong</name>
        <role>support</role>
        <salary currency="USD">5000</salary>
        <!-- for special characters like < &, need CDATA -->
        <bio><![CDATA[HTML tag <code>testing</code>]]></bio>
    </staff>
    <staff id="1002">
        <name>yflow</name>
        <role>admin</role>
        <salary currency="EUR">8000</salary>
        <bio><![CDATA[a & b]]></bio>
    </staff>
</Company>

4. StAX Cursor API to read a XML file

The below example uses StAX Cursor API to read or parse the above XML file to get the XML elements, attributes, CDATA, etc.

ReadXmlStAXCursorParser.java

package com.mkyong.xml.stax;

import javax.xml.XMLConstants;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.events.XMLEvent;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.nio.file.Path;
import java.nio.file.Paths;

public class ReadXmlStAXCursorParser {

    private static final String FILENAME = "src/main/resources/staff.xml";

    public static void main(String[] args) {

        try {

            printXmlByXmlCursorReader(Paths.get(FILENAME));

        } catch (FileNotFoundException | XMLStreamException e) {
            e.printStackTrace();
        }

    }

    private static void printXmlByXmlCursorReader(Path path)
            throws FileNotFoundException, XMLStreamException {

        XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();

        // https://rules.sonarsource.com/java/RSPEC-2755
        // prevent xxe
        xmlInputFactory.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "");
        xmlInputFactory.setProperty(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");

        XMLStreamReader reader = xmlInputFactory.createXMLStreamReader(
                new FileInputStream(path.toFile()));

        int eventType = reader.getEventType();
        System.out.println(eventType);   // 7, START_DOCUMENT
        System.out.println(reader);      // xerces

        while (reader.hasNext()) {

            eventType = reader.next();

            if (eventType == XMLEvent.START_ELEMENT) {

                switch (reader.getName().getLocalPart()) {

                    case "staff":
                        String id = reader.getAttributeValue(null, "id");
                        System.out.printf("Staff id : %s%n", id);
                        break;

                    case "name":
                        eventType = reader.next();
                        if (eventType == XMLEvent.CHARACTERS) {
                            System.out.printf("Name : %s%n", reader.getText());
                        }
                        break;

                    case "role":
                        eventType = reader.next();
                        if (eventType == XMLEvent.CHARACTERS) {
                            System.out.printf("Role : %s%n", reader.getText());
                        }
                        break;

                    case "salary":
                        String currency = reader.getAttributeValue(null, "currency");
                        eventType = reader.next();
                        if (eventType == XMLEvent.CHARACTERS) {
                            String salary = reader.getText();
                            System.out.printf("Salary [Currency] : %,.2f [%s]%n",
                              Float.parseFloat(salary), currency);
                        }
                        break;

                    case "bio":
                        eventType = reader.next();
                        if (eventType == XMLEvent.CHARACTERS) {
                            System.out.printf("Bio : %s%n", reader.getText());
                        }
                        break;
                }

            }

            if (eventType == XMLEvent.END_ELEMENT) {
                // if </staff>
                if (reader.getName().getLocalPart().equals("staff")) {
                    System.out.printf("%n%s%n%n", "---");
                }
            }

        }

    }

}

Output

Terminal

7
com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl@18be83e4

Staff id : 1001
Name : mkyong
Role : support
Salary [Currency] : 5,000.00 [USD]
Bio : HTML tag <code>testing</code>

---

Staff id : 1002
Name : yflow
Role : admin
Salary [Currency] : 8,000.00 [EUR]
Bio : a & b

---

Below is the code assistant for the Cursor API event type and its int

Cursor API event type

5. StAX Iterator API to read a XML file

The below example uses the StAX Iterator API to read or parse the above XML file to get the XML elements, attributes, CDATA, etc.

ReadXmlStAXEventParser.java

package com.mkyong.xml.stax;

import javax.xml.XMLConstants;
import javax.xml.namespace.QName;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.nio.file.Path;
import java.nio.file.Paths;

public class ReadXmlStAXEventParser {

    private static final String FILENAME = "src/main/resources/staff.xml";

    public static void main(String[] args) {

        try {

            printXmlByXmlEventReader(Paths.get(FILENAME));

        } catch (FileNotFoundException | XMLStreamException e) {
            e.printStackTrace();
        }

    }

    private static void printXmlByXmlEventReader(Path path)
            throws FileNotFoundException, XMLStreamException {

        XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();

        // https://rules.sonarsource.com/java/RSPEC-2755
        // prevent xxe
        xmlInputFactory.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "");
        xmlInputFactory.setProperty(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");

        XMLEventReader reader = xmlInputFactory.createXMLEventReader(
                new FileInputStream(path.toFile()));

        // event iterator
        while (reader.hasNext()) {

            XMLEvent event = reader.nextEvent();

            if (event.isStartElement()) {

                StartElement element = event.asStartElement();

                switch (element.getName().getLocalPart()) {
                    // if <staff>
                    case "staff":
                        // id='1001'
                        Attribute id = element.getAttributeByName(new QName("id"));
                        System.out.printf("Staff id : %s%n", id.getValue());
                        break;
                    case "name":
                        // throws StartElementEvent cannot be cast to class javax.xml.stream.events.Characters
                        // element.asCharacters().getData()

                        // this is still '<name>' tag, need move to next event for the character data
                        event = reader.nextEvent();
                        if (event.isCharacters()) {
                            System.out.printf("Name : %s%n", event.asCharacters().getData());
                        }
                        break;
                    case "role":
                        event = reader.nextEvent();
                        if (event.isCharacters()) {
                            System.out.printf("Role : %s%n", event.asCharacters().getData());
                        }
                        break;
                    case "salary":
                        // currency='USD'
                        Attribute currency = element.getAttributeByName(new QName("currency"));
                        event = reader.nextEvent();
                        if (event.isCharacters()) {
                            String salary = event.asCharacters().getData();
                            System.out.printf("Salary [Currency] : %,.2f [%s]%n",
                              Float.parseFloat(salary), currency);
                        }
                        break;
                    case "bio":
                        event = reader.nextEvent();
                        if (event.isCharacters()) {
                            // CDATA, no problem.
                            System.out.printf("Bio : %s%n", event.asCharacters().getData());
                        }
                        break;
                }
            }

            if (event.isEndElement()) {
                EndElement endElement = event.asEndElement();
                // if </staff>
                if (endElement.getName().getLocalPart().equals("staff")) {
                    System.out.printf("%n%s%n%n", "---");
                }
            }

        }

    }

}

Output

Terminal

Staff id : 1001
Name : mkyong
Role : support
Salary [Currency] : 5,000.00 [currency='USD']
Bio : HTML tag <code>testing</code>

---

Staff id : 1002
Name : yflow
Role : admin
Salary [Currency] : 8,000.00 [currency='EUR']
Bio : a & b

---

6. Convert XML to Java objects?

Yes, we can use the StAX API to convert XML to Java objects. For the above example, we already can get the XML data, create a POJO like Staff.java and set the value manually.

The Jakarta XML Binding (JAXB) is a recommended library to convert XML to/from Java objects.

7. Download Source Code

$ git clone https://github.com/mkyong/core-java

$ cd java-xml

$ cd src/main/java/com/mkyong/xml/stax/

8. References

About Author

author image
Founder of Mkyong.com, love Java and open source stuff. Follow him on Twitter. If you like my tutorials, consider make a donation to these charities.

Comments

Subscribe
Notify of
2 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
priya
1 year ago

Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as an updated one, keep blogging.

vinh
2 years ago

Your solution at 5. StAX Iterator API to read a XML file
This is not reading CDATA from pretty xml file
(XML is writing by
String prettyPrintXML = formatXML(xml);)

Please test