Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I determine which encoding the file uses before I read the file?

I'm facing a problem.

A file can be written in some encoding such as UTF-8, UTF-16, UTF-32, etc.

When I read a UTF-16 file, I use the code below:

 BufferedReader in = new BufferedReader(
                           new InputStreamReader(
                           new FileInputStream(file), "UTF16"));

How can I determine which encoding the file is in before I read the file ?

When I read UTF-8 encoded file using UTF-16 I can't read the characters correctly.

like image 547
Animesh Kumar Paul Avatar asked Dec 06 '25 10:12

Animesh Kumar Paul


1 Answers

There is no good way to do that. The question you're asking is like determining the radix of a number by looking at it. For example, what is the radix of 101?

Best solution would be to read the data into a byte array. Then you can use String(byte[] bytes, Charset charset) to test it with multiple encodings, most likely to least likely.

like image 183
user845279 Avatar answered Dec 08 '25 23:12

user845279