SAX Error – Content is not allowed in prolog

By mkyong | Updated: April 14, 2021

Viewed: 33,155 (+299 pv/w)

We use SAX parser to parse an XML file, and hist the following error message:

Terminal


org.xml.sax.SAXParseException; systemId: ../src/main/resources/staff.xml;

  lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.

In short, invalid text or BOM before the XML declaration or different encoding will cause the SAX Error – Content is not allowed in prolog.

1. Invalid text before the XML declaration.
2. BOM at the beginning of the XML file.
3. Different encoding format
4. Download Source Code
5. References

1. Invalid text before the XML declaration.

At the beginning of the XML declaration, any text will cause the Content is not allowed in prolog error.

For example, the below XML file contains an extra small dot . before the XML declaration.

staff.xml


.<?xml version="1.0" encoding="utf-8"?>
<company>
    <staff>
        <firstname>yong</firstname>
        <lastname>mook kim</lastname>
        <nickname>mkyong</nickname>
        <salary>100000</salary>
    </staff>
</company>

To fix it
Delete any text before the XML declaration.

staff.xml


<?xml version="1.0" encoding="utf-8"?>
<company>
    <staff>
        <firstname>yong</firstname>
        <lastname>mook kim</lastname>
        <nickname>mkyong</nickname>
        <salary>100000</salary>
    </staff>
</company>

2. BOM at the beginning of the XML file.

Many text editors auto adds BOM to the UTF-8 file.

Note
Read the following articles:

Tested with Java 11 and Java 8, the built-in SAX parser can parse the BOM UTF-8 file correctly; however, some developers claimed the BOM caused an error for XML parsing.

To fix it, remove the BOM from the UTF-8 file.

Remove the BOM via code
In notepad++, check Encoding UTF-8 without BOM.
In Intellij IDE, right on the file, select Remove BOM.

P.S Many text or code editors have features to add or remove byte order mark (BOM) for a file, try find the feature in the menu.

3. Different encoding format

The different encoding also caused the popular XML Content is not allowed in prolog.

For example, a UTF-8 XML file.


<?xml version="1.0" encoding="utf-8"?>
<Company>
    <staff id="1001">
        <name>mkyong</name>
        <role>support</role>
        <salary currency="USD">5000</salary>
        <!-- for special characters like < &, need CDATA -->
        <bio><![CDATA[HTML tag <code>testing</code>]]></bio>
    </staff>
    <staff id="1002">
        <name>yflow</name>
        <role>admin</role>
        <salary currency="EUR">8000</salary>
        <bio><![CDATA[a & b]]></bio>
    </staff>
</Company>

And we use a UTF-16 encoding to parse the above UTF-8 encoding XML file.


  SAXParserFactory factory = SAXParserFactory.newInstance();

  try (InputStream is = getXMLFileAsStream()) {

      SAXParser saxParser = factory.newSAXParser();

      // parse XML and map to object, it works, but not recommend, try JAXB
      MapStaffObjectHandlerSax handler = new MapStaffObjectHandlerSax();

      // more options for configuration
      XMLReader xmlReader = saxParser.getXMLReader();
      xmlReader.setContentHandler(handler);

      InputSource source = new InputSource(is);

      // UTF-16 to parse an UTF-8 XML file
      source.setEncoding(StandardCharsets.UTF_16.toString());
      xmlReader.parse(source);

      // print all
      List<Staff> result = handler.getResult();
      result.forEach(System.out::println);

  } catch (ParserConfigurationException | SAXException | IOException e) {
      e.printStackTrace();
  }

Output

Terminal


[Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1243)
at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at com.mkyong.xml.sax.ReadXmlSaxParser2.main(ReadXmlSaxParser2.java:45)

4. Download Source Code

$ git clone https://github.com/mkyong/core-java

$ cd java-xml

$ cd src/main/java/com/mkyong/xml/sax/

5. References

About Author

mkyong

Founder of Mkyong.com, love Java and open source stuff. Follow him on Twitter. If you like my tutorials, consider make a donation to these charities.

Comments

10 Comments

Most Voted

Newest Oldest

Inline Feedbacks

View all comments

test

10 years ago

Hi All
Check Encoding “UTF-8 without BOM” in notepad++
if nothing is there b4

ded@qq. com

4 years ago

Reply to test

This solution works. Thanks

Anas Shawesh

4 years ago

Reply to test

perfect thanks

Venkat Muthu

5 years ago

Reply to test

This worked for me. May be you should update this fix in the above section.

Maciej

10 years ago

Reply to test

Thanks, worked for me! 🙂

edo

3 years ago

how solved ???

Rajesh Antappan

3 years ago

Reply to edo

in my case there was no BOM character, so tried adding setValidation(false) before setting the xmlDoc object and it worked.
factory.setValidating(false);
xmlDoc = factory.newDocumentBuilder().parse(filePath);

Emad

2 years ago

BOM encoded files crashes with same , so wrapping the inputstream using apach BomInputStream solved the issue

Anand

5 years ago

Thanks for the post. was useful to me

Vishal

11 years ago

public static void main(String[] args) {
	String p_message = "<?xml version=\"1.0\"?>"+
			"<!DOCTYPE FCCGENERIC SYSTEM \"./FCCGENERIC.DTD\">"+
			"<FCCGENERIC>"+
				"<REPLY_ACK>"+
				    "<REQ_TYPE>UPLOAD_PMNT</REQ_TYPE>"+
				    "<XREF>406550133038787</XREF>"+
				"</REPLY_ACK>"+
			"</FCCGENERIC>";
	DOMParser domParser = null;
	Document  xmlDocument;
	Element documentElement;
	StringReader	l_reader 	= null;
	InputSource	l_in_source	= null;

	if (domParser == null) {		
	domParser	= new DOMParser ();
	}

	try {
	l_reader	= new StringReader (p_message);
	l_in_source = new InputSource (l_reader);
	domParser.parse (l_in_source);
	l_reader.close ();
	xmlDocument		= domParser.getDocument ();
	documentElement	= xmlDocument.getDocumentElement ();
} finally {
	try {
	l_reader.close ();
} catch (Exception e1) {
}
	l_reader	= null;
	l_in_source	= null;
}

	This comment is spam
	This comment is irrelevant
	This comment is abusive
	Other