Java – How to add and remove BOM from UTF-8 file
This article shows you how to add, check and remove the byte order mark (BOM) from a UTF-8 file. The UTF-8 representation of the BOM is the byte sequence 0xEF
, 0xBB
, 0xBF
(hexadecimal), at the beginning of the file.
- 1. Add BOM to a UTF-8 file
- 2. Check if a file contains UTF-8 BOM
- 3. Remove BOM from a UTF-8 file
- 4. Copy a file and add BOM
- 5. Download Source Code
- 6. References
Further Reading
Read more about BOM and UTF-8
P.S The below BOM examples only works for UTF-8 file.
1. Add BOM to a UTF-8 file
To Add BOM to a UTF-8 file, we can directly write Unicode \ufeff
or three bytes 0xEF
,0xBB
,0xBF
at the beginning of the UTF-8 file.
Note
The Unicode \ufeff
represents 0xEF
,0xBB
,0xBF
, read this.
1.1 The below example, write a BOM to a UTF-8 file /home/mkyong/file.txt
.
package com.mkyong.io.howto;
import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class AddBomToUtf8File {
public static void main(String[] args) throws IOException {
Path path = Paths.get("/home/mkyong/file.txt");
writeBomFile(path, "mkyong");
}
private static void writeBomFile(Path path, String content) {
// Java 8 default UTF-8
try (BufferedWriter bw = Files.newBufferedWriter(path)) {
bw.write("\ufeff");
bw.write(content);
bw.newLine();
bw.write(content);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Output
$ hexdump -C /home/mkyong/file.txt
00000000 ef bb bf 6d 6b 79 6f 6e 67 0a 6d 6b 79 6f 6e 67 |...mkyong.mkyong|
00000010
$ file /home/mkyong/file.txt
file.txt: UTF-8 Unicode (with BOM) text
$ cat /home/mkyong/file.txt
mkyong
mkyong
1.2 Before Java 8, BufferedWriter
and OutputStreamWriter
examples of writing BOM to a UTF-8 file.
private static void writeBomFile(Path path, String content) {
try (BufferedWriter bw = new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream(path.toFile())
, StandardCharsets.UTF_8))) {
bw.write("\ufeff");
bw.write(content);
bw.newLine();
bw.write(content);
} catch (IOException e) {
e.printStackTrace();
}
}
1.3 PrintWriter
and OutputStreamWriter
example to write BOM to a UTF-8 file. The 0xfeff
is the byte order mark (BOM) codepoint.
private static void writeBomFile(Path path, String content) {
try (PrintWriter pw = new PrintWriter(
new OutputStreamWriter(
new FileOutputStream(path.toFile()), StandardCharsets.UTF_8))) {
//pw.write("\ufeff");
pw.write(0xfeff); // alternative, codepoint
pw.write(content);
pw.write(System.lineSeparator());
pw.write(content);
} catch (IOException e) {
e.printStackTrace();
}
}
1.4 Alternatively, we can write the BOM byte sequence 0xEF
, 0xBB
, and 0xBF
directly to a file.
private static void writeBomFile4(Path path, String content) {
try (FileOutputStream fos = new FileOutputStream(path.toFile())) {
byte[] BOM = {(byte) 0xEF, (byte) 0xBB, (byte) 0xBF};
fos.write(BOM);
fos.write(content.getBytes(StandardCharsets.UTF_8));
fos.write(System.lineSeparator().getBytes(StandardCharsets.UTF_8));
fos.write(content.getBytes(StandardCharsets.UTF_8));
} catch (IOException e) {
e.printStackTrace();
}
}
2. Check if a file contains UTF-8 BOM
The below example read the first 3 bytes from a file and check if it contains the 0xEF
, 0xBB
, 0xBF
byte sequence.
package com.mkyong.io.howto;
import org.apache.commons.codec.binary.Hex;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class CheckBom {
public static void main(String[] args) throws IOException {
Path path = Paths.get("/home/mkyong/file.txt");
if(isContainBOM(path)){
System.out.println("Found BOM!");
}else{
System.out.println("No BOM.");
}
}
private static boolean isContainBOM(Path path) throws IOException {
if(Files.notExists(path)){
throw new IllegalArgumentException("Path: " + path + " does not exists!");
}
boolean result = false;
byte[] bom = new byte[3];
try(InputStream is = new FileInputStream(path.toFile())){
// read first 3 bytes of a file.
is.read(bom);
// BOM encoded as ef bb bf
String content = new String(Hex.encodeHex(bom));
if ("efbbbf".equalsIgnoreCase(content)) {
result = true;
}
}
return result;
}
}
Output
Found BOM!
The import org.apache.commons.codec.binary.Hex;
is in the below commons-codec
library. Or, we can use one of these methods to convert bytes to hex.
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.14</version>
</dependency>
3. Remove BOM from a UTF-8 file
The below example ByteBuffer
to remove BOM from a UTF-8 file.
P.S Some XML, JSON, CSV parsers may fail to parse or process the file if it contains BOM in the UTF-8 file; it is common to remove or skip the BOM before parsing the file.
package com.mkyong.io.howto;
import org.apache.commons.codec.binary.Hex;
import java.io.BufferedWriter;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.ByteBuffer;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class RemoveBomFromUtf8File {
public static void main(String[] args) throws IOException {
Path path = Paths.get("/home/mkyong/file.txt");
writeBomFile(path, "mkyong");
removeBom(path);
}
private static void writeBomFile(Path path, String content) {
// Java 8 default UTF-8
try (BufferedWriter bw = Files.newBufferedWriter(path)) {
bw.write("\ufeff");
bw.write(content);
bw.newLine();
bw.write(content);
} catch (IOException e) {
e.printStackTrace();
}
}
private static boolean isContainBOM(Path path) throws IOException {
if (Files.notExists(path)) {
throw new IllegalArgumentException("Path: " + path + " does not exists!");
}
boolean result = false;
byte[] bom = new byte[3];
try (InputStream is = new FileInputStream(path.toFile())) {
// read 3 bytes of a file.
is.read(bom);
// BOM encoded as ef bb bf
String content = new String(Hex.encodeHex(bom));
if ("efbbbf".equalsIgnoreCase(content)) {
result = true;
}
}
return result;
}
private static void removeBom(Path path) throws IOException {
if (isContainBOM(path)) {
byte[] bytes = Files.readAllBytes(path);
ByteBuffer bb = ByteBuffer.wrap(bytes);
System.out.println("Found BOM!");
byte[] bom = new byte[3];
// get the first 3 bytes
bb.get(bom, 0, bom.length);
// remaining
byte[] contentAfterFirst3Bytes = new byte[bytes.length - 3];
bb.get(contentAfterFirst3Bytes, 0, contentAfterFirst3Bytes.length);
System.out.println("Remove the first 3 bytes, and overwrite the file!");
// override the same path
Files.write(path, contentAfterFirst3Bytes);
} else {
System.out.println("This file doesn't contains UTF-8 BOM!");
}
}
}
Output
Found BOM!
Remove the first 3 bytes, and overwrite the file!
4. Copy a file and add BOM
The below example copy of a file and add a BOM to the target file.
package com.mkyong.xml.sax;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class CopyAndAddBomToXmlFile {
public static void main(String[] args) {
Path src = Paths.get("src/main/resources/staff.xml");
Path dest = Paths.get("src/main/resources/staff-bom.xml");
writeBomFile(src, dest);
}
private static void writeBomFile(Path src, Path dest) {
try (FileOutputStream fos = new FileOutputStream(dest.toFile())) {
byte[] BOM = {(byte) 0xEF, (byte) 0xBB, (byte) 0xBF};
// add BOM
fos.write(BOM);
// BOM + src to fos
Files.copy(src, fos);
} catch (IOException e) {
e.printStackTrace();
}
}
}
5. Download Source Code
$ git clone https://github.com/mkyong/core-java
$ cd java-io
6. References
- Wikipedia – Byte order mark
- Stackoverflow – What’s the difference between UTF-8 and UTF-8 without BOM?
- Java – Create and write a file
- How to write to file in Java – BufferedWriter
- SAX Error – Content is not allowed in prolog
- Java – How to join and split byte arrays, bytes
- Java – How to convert byte arrays to Hex
- BOM : Java Glossary
Is it the same for UTF-16 BOM? I’m trying to remove the BOM from any UTF-8 or UTF-16 files
Thanks, very useful information
was looking for a way to detect and remove bom before using csv parser, your solution works greet, thanks!
Great Article ! Really appreciate the information !