Java XML Tutorial

SAX Error – Content is not allowed in prolog

We use SAX parser to parse an XML file, and hist the following error message:

Terminal

org.xml.sax.SAXParseException; systemId: ../src/main/resources/staff.xml;

  lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.

In short, invalid text or BOM before the XML declaration or different encoding will cause the SAX Error – Content is not allowed in prolog.

1. Invalid text before the XML declaration.

At the beginning of the XML declaration, any text will cause the Content is not allowed in prolog error.

For example, the below XML file contains an extra small dot . before the XML declaration.

staff.xml

.<?xml version="1.0" encoding="utf-8"?>
<company>
    <staff>
        <firstname>yong</firstname>
        <lastname>mook kim</lastname>
        <nickname>mkyong</nickname>
        <salary>100000</salary>
    </staff>
</company>

To fix it
Delete any text before the XML declaration.

staff.xml

<?xml version="1.0" encoding="utf-8"?>
<company>
    <staff>
        <firstname>yong</firstname>
        <lastname>mook kim</lastname>
        <nickname>mkyong</nickname>
        <salary>100000</salary>
    </staff>
</company>

2. BOM at the beginning of the XML file.

Many text editors auto adds BOM to the UTF-8 file.

Tested with Java 11 and Java 8, the built-in SAX parser can parse the BOM UTF-8 file correctly; however, some developers claimed the BOM caused an error for XML parsing.

To fix it, remove the BOM from the UTF-8 file.

  1. Remove the BOM via code
  2. In notepad++, check Encoding UTF-8 without BOM.
  3. In Intellij IDE, right on the file, select Remove BOM.

P.S Many text or code editors have features to add or remove byte order mark (BOM) for a file, try find the feature in the menu.

3. Different encoding format

The different encoding also caused the popular XML Content is not allowed in prolog.

For example, a UTF-8 XML file.


<?xml version="1.0" encoding="utf-8"?>
<Company>
    <staff id="1001">
        <name>mkyong</name>
        <role>support</role>
        <salary currency="USD">5000</salary>
        <!-- for special characters like < &, need CDATA -->
        <bio><![CDATA[HTML tag <code>testing</code>]]></bio>
    </staff>
    <staff id="1002">
        <name>yflow</name>
        <role>admin</role>
        <salary currency="EUR">8000</salary>
        <bio><![CDATA[a & b]]></bio>
    </staff>
</Company>

And we use a UTF-16 encoding to parse the above UTF-8 encoding XML file.


  SAXParserFactory factory = SAXParserFactory.newInstance();

  try (InputStream is = getXMLFileAsStream()) {

      SAXParser saxParser = factory.newSAXParser();

      // parse XML and map to object, it works, but not recommend, try JAXB
      MapStaffObjectHandlerSax handler = new MapStaffObjectHandlerSax();

      // more options for configuration
      XMLReader xmlReader = saxParser.getXMLReader();
      xmlReader.setContentHandler(handler);

      InputSource source = new InputSource(is);

      // UTF-16 to parse an UTF-8 XML file
      source.setEncoding(StandardCharsets.UTF_16.toString());
      xmlReader.parse(source);

      // print all
      List<Staff> result = handler.getResult();
      result.forEach(System.out::println);

  } catch (ParserConfigurationException | SAXException | IOException e) {
      e.printStackTrace();
  }

Output

Terminal

[Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1243)
at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at com.mkyong.xml.sax.ReadXmlSaxParser2.main(ReadXmlSaxParser2.java:45)

4. Download Source Code

$ git clone https://github.com/mkyong/core-java

$ cd java-xml

$ cd src/main/java/com/mkyong/xml/sax/

5. References

About Author

author image
Founder of Mkyong.com, love Java and open source stuff. Follow him on Twitter. If you like my tutorials, consider make a donation to these charities.

Comments

Subscribe
Notify of
10 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
test
10 years ago

Hi All
Check Encoding “UTF-8 without BOM” in notepad++
if nothing is there b4

ded@qq. com
4 years ago
Reply to  test

This solution works. Thanks

Anas Shawesh
4 years ago
Reply to  test

perfect thanks

Venkat Muthu
5 years ago
Reply to  test

This worked for me. May be you should update this fix in the above section.

Maciej
10 years ago
Reply to  test

Thanks, worked for me! 🙂

edo
3 years ago

how solved ???

Rajesh Antappan
3 years ago
Reply to  edo

in my case there was no BOM character, so tried adding setValidation(false) before setting the xmlDoc object and it worked.
factory.setValidating(false);
xmlDoc = factory.newDocumentBuilder().parse(filePath);

Emad
2 years ago

BOM encoded files crashes with same , so wrapping the inputstream using apach BomInputStream solved the issue

Anand
4 years ago

Thanks for the post. was useful to me

Vishal
11 years ago
public static void main(String[] args) {
	String p_message = "<?xml version=\"1.0\"?>"+
			"<!DOCTYPE FCCGENERIC SYSTEM \"./FCCGENERIC.DTD\">"+
			"<FCCGENERIC>"+
				"<REPLY_ACK>"+
				    "<REQ_TYPE>UPLOAD_PMNT</REQ_TYPE>"+
				    "<XREF>406550133038787</XREF>"+
				"</REPLY_ACK>"+
			"</FCCGENERIC>";
	DOMParser domParser = null;
	Document  xmlDocument;
	Element documentElement;
	StringReader	l_reader 	= null;
	InputSource	l_in_source	= null;

	if (domParser == null) {		
	domParser	= new DOMParser ();
	}

	try {
	l_reader	= new StringReader (p_message);
	l_in_source = new InputSource (l_reader);
	domParser.parse (l_in_source);
	l_reader.close ();
	xmlDocument		= domParser.getDocument ();
	documentElement	= xmlDocument.getDocumentElement ();
} finally {
	try {
	l_reader.close ();
} catch (Exception e1) {
}
	l_reader	= null;
	l_in_source	= null;
}