SAX Error – MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
Written on
December 17, 2009 at 8:23 pm by
mkyong
When some special UTF-8 characters inside a XML file, and your SAX’s parser is not configure to parse the UTF-8 properly, the following exception mat raise.
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:674) at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:547) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(XMLEntityScanner.java:487) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2679) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) at javax.xml.parsers.SAXParser.parse(SAXParser.java:277) at com.steve.indexing.App.main(App.java:104)
Solution
The solution is quite simple, get the content in UTF-8 format, and overrride the SAX input source
File file = new File("c:\\file-utf.xml"); InputStream inputStream= new FileInputStream(file); Reader reader = new InputStreamReader(inputStream,"UTF-8"); InputSource is = new InputSource(reader); is.setEncoding("UTF-8"); saxParser.parse(is, handler);
Full examples can be find here – how do read UTF-8 XML file with SAX parser


[...] you used normal SAX’s way to parse it, you may encounter this “Invalid byte 1 of 1-byte UTF-8 sequence” [...]