Problem

When some special UTF-8 characters inside a XML file, and your SAX’s parser is not configure to parse the UTF-8 properly, the following exception will be thrown.

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: 
Invalid byte 1 of 1-byte UTF-8 sequence.
...

Solution

The solution is quite simple, get the content in UTF-8 format, and override the SAX input source.

File file = new File("c:\\file-utf.xml");
InputStream inputStream= new FileInputStream(file);
Reader reader = new InputStreamReader(inputStream,"UTF-8");
 
InputSource is = new InputSource(reader);
is.setEncoding("UTF-8");
 
saxParser.parse(is, handler);

You can read the full example here – how do read UTF-8 XML file with SAX parser

Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world\'s largest enterprise software company.
Publisher : Oracle Corporation