Java IO Tutorial

How to read a UTF-8 file in Java

In Java, the InputStreamReader accepts a charset to decode the byte streams into character streams. We can pass a StandardCharsets.UTF_8 into the InputStreamReader constructor to read data from a UTF-8 file.


import java.nio.charset.StandardCharsets;

  //...
  try (FileInputStream fis = new FileInputStream(file);
       InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
       BufferedReader reader = new BufferedReader(isr)
  ) {

      String str;
      while ((str = reader.readLine()) != null) {
          System.out.println(str);
      }

  } catch (IOException e) {
      e.printStackTrace();
  }

In Java 7+, many file read APIs start to accept charset as an argument, making reading a UTF-8 very easy.


  // Java 7
  BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);

  // Java 8
  List<String> list = Files.readAllLines(path, StandardCharsets.UTF_8);

  // Java 8
  Stream<String> lines = Files.lines(path, StandardCharsets.UTF_8);

  // Java 11
  String s = Files.readString(path, StandardCharsets.UTF_8);

1. UTF-8 File

A UTF-8 encoded file c:\\temp\\test.txt, with Chinese characters.

utf-8 file

2. Read UTF-8 file

This example shows a few ways to read a UTF-8 file.


package com.mkyong.io.howto;

import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.stream.Stream;

public class UnicodeRead {

    public static void main(String[] args) {

        String fileName = "c:\\temp\\test.txt";

        //readUnicodeJava11(fileName);
        readUnicodeBufferedReader(fileName);
        //readUnicodeFiles(fileName);
        //readUnicodeClassic(fileName);

    }

    // Java 7 - Files.newBufferedReader(path, StandardCharsets.UTF_8)
    // Java 8 - Files.newBufferedReader(path) // default UTF-8
    public static void readUnicodeBufferedReader(String fileName) {

        Path path = Paths.get(fileName);

        // Java 8, default UTF-8
        try (BufferedReader reader = Files.newBufferedReader(path)) {

            String str;
            while ((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    public static void readUnicodeFiles(String fileName) {

        Path path = Paths.get(fileName);
        try {

            // Java 11
            String s = Files.readString(path, StandardCharsets.UTF_8);
            System.out.println(s);

            // Java 8
            List<String> list = Files.readAllLines(path, StandardCharsets.UTF_8);
            list.forEach(System.out::println);

            // Java 8
            Stream<String> lines = Files.lines(path, StandardCharsets.UTF_8);
            lines.forEach(System.out::println);

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    // Java 11, adds charset to FileReader
    public static void readUnicodeJava11(String fileName) {

        Path path = Paths.get(fileName);

        try (FileReader fr = new FileReader(fileName, StandardCharsets.UTF_8);
             BufferedReader reader = new BufferedReader(fr)) {

            String str;
            while ((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    public static void readUnicodeClassic(String fileName) {

        File file = new File(fileName);

        try (FileInputStream fis = new FileInputStream(file);
             InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
             BufferedReader reader = new BufferedReader(isr)
        ) {

            String str;
            while ((str = reader.readLine()) != null) {
                System.out.println(str);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }
}

Output

Terminal

line 1
line 2
line 3
你好,世界

Download Source Code

$ git clone https://github.com/mkyong/core-java

$ cd java-io

References

About Author

author image
Founder of Mkyong.com, love Java and open source stuff. Follow him on Twitter. If you like my tutorials, consider make a donation to these charities.

Comments

Subscribe
Notify of
15 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
Dipak
7 years ago

Thanks Man.. this fixed my issue. I am grateful.

zik
10 years ago

According to http://docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html

maybe you should specify the encoding name to be “UTF-8” rather than “UTF8”?

Fabricio
5 years ago

thank you

Julio
6 years ago

thanks!!, this tutorial helped with my problem.

Jagat
8 years ago

Hey,
I am getting only sybmols from the extracted gzip file.
Can anyone helps me?

Anand Kadam
8 years ago
Reply to  Jagat

go to the project properties and set “Text encoding file” as UTF8.

sameer j
8 years ago

“copywrite symbol” get converted to “question mark inside blackdiamond “

Omer
9 years ago

You are amazing thank you mr mkyong

Raghavendra
10 years ago

Hi Mkyong, How to get the encoding characterset of a file in java? Please provide the source code for this. And is UTF-8-> ANSI ?

varun bhatia
11 years ago

Does not work for me! I am getting a ? prefixed to the first line.

Launcher Go
12 years ago

I just want to ask why you put a lot of catch statements wherein you already put a generalized catch statement at the bottom?

sorry, I am still a newbie~

Anthony
8 years ago
Reply to  Launcher Go

to catch any other exception that could be trowed without being catched by the other catch statements

sridhar
13 years ago

this is a real stupidity that i can’t post UTF-8 string to explain the problem in code

reddy
6 years ago

i dint get utf abbrivation