How to read XML file in Java (StAX Parser)
This tutorial shows how to use the Streaming API for XML (StAX) parser to read or parse an XML document.
Table of contents
- 1. What is StAX
- 2. StAX Cursor API and Iterator API
- 3. A XML file
- 4. StAX Cursor API to read a XML file
- 5. StAX Iterator API to read a XML file
- 6. Convert XML to Java objects?
- 7. Download Source Code
- 8. References
Note
P.S The Streaming API for XML (StAX) API is available since Java 1.6, a built-in JDK XML library.
P.S All below examples are tested with Java 11.
1. What is StAX
The StAX stands for Streaming API for XML (StAX)
, a pull API
to work with the XML document.
There are two programming models for working with the XML document, streaming and the document object model (DOM). For DOM models, we can use DOM parser; For streaming model, we can use SAX parser or StAX parser.
1.1 Difference between SAX and StAX?
The Simple API for XML (SAX)
is a push API
; this SAX parser sends (push) the XML data to the client continuously. In SAX, the client has no control of when to receive the XML data.
For example, we register a custom DefaultHandler
implementation to process the XML data sent by the SAX parser. Read the complete SAX example.
// SAX
SAXParser saxParser = factory.newSAXParser();
// DefaultHandler implementation
PrintAllHandlerSax handler = new PrintAllHandlerSax();
saxParser.parse(FILENAME, handler);
The Streaming API for XML (StAX)
is a pull API
; the client calls methods on the StAX parser library to get (pull) the XML data one by one manually. In StAX, the client in control of when to get (pull) the XML data.
// StAX Iterator API examples
// next event
XMLEvent event = xmlEventReader.nextEvent();
// moves to next event
event = xmlEventReader.nextEvent();
// moves to next event
event = xmlEventReader.nextEvent();
Further Reading
2. StAX Cursor API and Iterator API
The StAX contains two API sets: a cursor API and an iterator API.
2.1 StAX Cursor API
The StAX Cursor API contains two main interfaces XMLStreamReader
and XMLStreamWriter
. The XMLStreamReader.getEventType()
will return a int
and we need to map the event type manually.
// StAX Cursor API
XMLStreamReader reader = xmlInputFactory.createXMLStreamReader(
new FileInputStream(path.toFile()));
// this is int! we need to map the eventType manually
int eventType = reader.getEventType();
while (reader.hasNext()) {
eventType = reader.next();
if (eventType == XMLEvent.START_ELEMENT) {
}
//...
}
2.2 StAX Iterator API
The StAX Iterator API contains two main interfaces XMLEventReader
and XMLEventWriter
, and we work with the XMLEvent.
// StAX Iterator API
XMLEventReader reader = xmlInputFactory.createXMLEventReader(
new FileInputStream(path.toFile()));
// event iterator
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
}
//...
}
2.3 which one? Cursor or Iterator APIs?
- The Cursor API makes smaller and efficient code, also better performance compare to Iterator API. Suitable for high-performance applications or mobile apps.
- The Iterator API provides XML events, which are more flexible, extensible, and easy to code with, suitable for enterprise applications.
Further Reading
3. A XML file
Below is an XML document, later we use the StAX parser to read the XML data and print it out.
<?xml version="1.0" encoding="utf-8"?>
<Company>
<staff id="1001">
<name>mkyong</name>
<role>support</role>
<salary currency="USD">5000</salary>
<!-- for special characters like < &, need CDATA -->
<bio><![CDATA[HTML tag <code>testing</code>]]></bio>
</staff>
<staff id="1002">
<name>yflow</name>
<role>admin</role>
<salary currency="EUR">8000</salary>
<bio><![CDATA[a & b]]></bio>
</staff>
</Company>
4. StAX Cursor API to read a XML file
The below example uses StAX Cursor API to read or parse the above XML file to get the XML elements, attributes, CDATA, etc.
package com.mkyong.xml.stax;
import javax.xml.XMLConstants;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.events.XMLEvent;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.nio.file.Path;
import java.nio.file.Paths;
public class ReadXmlStAXCursorParser {
private static final String FILENAME = "src/main/resources/staff.xml";
public static void main(String[] args) {
try {
printXmlByXmlCursorReader(Paths.get(FILENAME));
} catch (FileNotFoundException | XMLStreamException e) {
e.printStackTrace();
}
}
private static void printXmlByXmlCursorReader(Path path)
throws FileNotFoundException, XMLStreamException {
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
// https://rules.sonarsource.com/java/RSPEC-2755
// prevent xxe
xmlInputFactory.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "");
xmlInputFactory.setProperty(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");
XMLStreamReader reader = xmlInputFactory.createXMLStreamReader(
new FileInputStream(path.toFile()));
int eventType = reader.getEventType();
System.out.println(eventType); // 7, START_DOCUMENT
System.out.println(reader); // xerces
while (reader.hasNext()) {
eventType = reader.next();
if (eventType == XMLEvent.START_ELEMENT) {
switch (reader.getName().getLocalPart()) {
case "staff":
String id = reader.getAttributeValue(null, "id");
System.out.printf("Staff id : %s%n", id);
break;
case "name":
eventType = reader.next();
if (eventType == XMLEvent.CHARACTERS) {
System.out.printf("Name : %s%n", reader.getText());
}
break;
case "role":
eventType = reader.next();
if (eventType == XMLEvent.CHARACTERS) {
System.out.printf("Role : %s%n", reader.getText());
}
break;
case "salary":
String currency = reader.getAttributeValue(null, "currency");
eventType = reader.next();
if (eventType == XMLEvent.CHARACTERS) {
String salary = reader.getText();
System.out.printf("Salary [Currency] : %,.2f [%s]%n",
Float.parseFloat(salary), currency);
}
break;
case "bio":
eventType = reader.next();
if (eventType == XMLEvent.CHARACTERS) {
System.out.printf("Bio : %s%n", reader.getText());
}
break;
}
}
if (eventType == XMLEvent.END_ELEMENT) {
// if </staff>
if (reader.getName().getLocalPart().equals("staff")) {
System.out.printf("%n%s%n%n", "---");
}
}
}
}
}
Output
7
com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl@18be83e4
Staff id : 1001
Name : mkyong
Role : support
Salary [Currency] : 5,000.00 [USD]
Bio : HTML tag <code>testing</code>
---
Staff id : 1002
Name : yflow
Role : admin
Salary [Currency] : 8,000.00 [EUR]
Bio : a & b
---
Below is the code assistant for the Cursor API event type and its int
5. StAX Iterator API to read a XML file
The below example uses the StAX Iterator API to read or parse the above XML file to get the XML elements, attributes, CDATA, etc.
package com.mkyong.xml.stax;
import javax.xml.XMLConstants;
import javax.xml.namespace.QName;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.nio.file.Path;
import java.nio.file.Paths;
public class ReadXmlStAXEventParser {
private static final String FILENAME = "src/main/resources/staff.xml";
public static void main(String[] args) {
try {
printXmlByXmlEventReader(Paths.get(FILENAME));
} catch (FileNotFoundException | XMLStreamException e) {
e.printStackTrace();
}
}
private static void printXmlByXmlEventReader(Path path)
throws FileNotFoundException, XMLStreamException {
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
// https://rules.sonarsource.com/java/RSPEC-2755
// prevent xxe
xmlInputFactory.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "");
xmlInputFactory.setProperty(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");
XMLEventReader reader = xmlInputFactory.createXMLEventReader(
new FileInputStream(path.toFile()));
// event iterator
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
StartElement element = event.asStartElement();
switch (element.getName().getLocalPart()) {
// if <staff>
case "staff":
// id='1001'
Attribute id = element.getAttributeByName(new QName("id"));
System.out.printf("Staff id : %s%n", id.getValue());
break;
case "name":
// throws StartElementEvent cannot be cast to class javax.xml.stream.events.Characters
// element.asCharacters().getData()
// this is still '<name>' tag, need move to next event for the character data
event = reader.nextEvent();
if (event.isCharacters()) {
System.out.printf("Name : %s%n", event.asCharacters().getData());
}
break;
case "role":
event = reader.nextEvent();
if (event.isCharacters()) {
System.out.printf("Role : %s%n", event.asCharacters().getData());
}
break;
case "salary":
// currency='USD'
Attribute currency = element.getAttributeByName(new QName("currency"));
event = reader.nextEvent();
if (event.isCharacters()) {
String salary = event.asCharacters().getData();
System.out.printf("Salary [Currency] : %,.2f [%s]%n",
Float.parseFloat(salary), currency);
}
break;
case "bio":
event = reader.nextEvent();
if (event.isCharacters()) {
// CDATA, no problem.
System.out.printf("Bio : %s%n", event.asCharacters().getData());
}
break;
}
}
if (event.isEndElement()) {
EndElement endElement = event.asEndElement();
// if </staff>
if (endElement.getName().getLocalPart().equals("staff")) {
System.out.printf("%n%s%n%n", "---");
}
}
}
}
}
Output
Staff id : 1001
Name : mkyong
Role : support
Salary [Currency] : 5,000.00 [currency='USD']
Bio : HTML tag <code>testing</code>
---
Staff id : 1002
Name : yflow
Role : admin
Salary [Currency] : 8,000.00 [currency='EUR']
Bio : a & b
---
6. Convert XML to Java objects?
Yes, we can use the StAX API to convert XML to Java objects. For the above example, we already can get the XML data, create a POJO like Staff.java
and set the value manually.
The Jakarta XML Binding (JAXB) is a recommended library to convert XML to/from Java objects.
7. Download Source Code
$ git clone https://github.com/mkyong/core-java
$ cd java-xml
$ cd src/main/java/com/mkyong/xml/stax/
8. References
- Wikipedia – Java API for XML Processing
- Oracle – Java API for XML Processing (JAXP)
- Oracle – StAX examples
- Streaming API for XML (StAX)
- An Introduction to StAX
- How to read XML file in Java – (DOM Parser)
- How to read XML file in Java – (SAX Parser)
- How to read XML file in Java – (JDOM Parser)
- JAXB hello world example
- How to prevent XML external entity attack (XXE attack)
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as an updated one, keep blogging.
Your solution at 5. StAX Iterator API to read a XML file
This is not reading CDATA from pretty xml file
(XML is writing by
String prettyPrintXML = formatXML(xml);)
Please test