How to read UTF-8 XML file in Java – (SAX Parser)
Written on
December 17, 2009 at 6:04 am by
mkyong
Here’s my previous article about how to read XML file in Java SAX Parser. The previous example is working fine to parse the plain text (ANSI) XML file, if some special UTF-8 words inside a XML file, it may encounter “MalformedByteSequenceException” UTF-8 exception.
Tutorial …
1. Create a XML file
This is a xml file which contain a special UTF-8 characters “§” (press Alt+789)
<?xml version="1.0"?> <company> <staff> <firstname>yong</firstname> <lastname>mook kim</lastname> <nickname>§</nickname> <salary>100000</salary> </staff> </company>
If you used normal SAX’s way to parse it, you may encounter this “Invalid byte 1 of 1-byte UTF-8 sequence” error.
2. Create a Java File
Normal SAX’s way, do not support UTF-8
saxParser.parse("c:\\file.xml", handler);
You have to get the file in UTF-8 and override the SAX’s input source.
File file = new File("c:\\file-utf.xml"); InputStream inputStream= new FileInputStream(file); Reader reader = new InputStreamReader(inputStream,"UTF-8"); InputSource is = new InputSource(reader); is.setEncoding("UTF-8"); saxParser.parse(is, handler);
Full example…
package com.mkyong.test; import java.io.File; import java.io.FileInputStream; import java.io.InputStream; import java.io.InputStreamReader; import java.io.Reader; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; public class ReadXMLUTF8FileSAX { public static void main( String[] args ) { try { SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser saxParser = factory.newSAXParser(); DefaultHandler handler = new DefaultHandler() { boolean bfname = false; boolean blname = false; boolean bnname = false; boolean bsalary = false; public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { System.out.println("Start Element :" + qName); if (qName.equalsIgnoreCase("FIRSTNAME")) { bfname = true; } if (qName.equalsIgnoreCase("LASTNAME")) { blname = true; } if (qName.equalsIgnoreCase("NICKNAME")) { bnname = true; } if (qName.equalsIgnoreCase("SALARY")) { bsalary = true; } } public void endElement(String uri, String localName, String qName) throws SAXException { System.out.println("End Element :" + qName); } public void characters(char ch[], int start, int length) throws SAXException { System.out.println(new String(ch, start, length)); if (bfname) { System.out.println("First Name : " + new String(ch, start, length)); bfname = false; } if (blname) { System.out.println("Last Name : " + new String(ch, start, length)); blname = false; } if (bnname) { System.out.println("Nick Name : " + new String(ch, start, length)); bnname = false; } if (bsalary) { System.out.println("Salary : " + new String(ch, start, length)); bsalary = false; } } }; File file = new File("c:\\file.xml"); InputStream inputStream= new FileInputStream(file); Reader reader = new InputStreamReader(inputStream,"UTF-8"); InputSource is = new InputSource(reader); is.setEncoding("UTF-8"); saxParser.parse(is, handler); } catch (Exception e) { e.printStackTrace(); } } }
3. Done
Run your Java program and see the output



[...] Full examples can be find here – how do read UTF-8 XML file with SAX parser [...]
[...] This example may encounter some exceptions for UTF-8 XML file, please read this article about how to readthe XML UTF-8 file in SAX [...]