Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java character conversion to UTF-8

Tags:

java

I am using:

InputStreamReader isr = new InputStreamReader(fis, "UTF8");

to read in characters from a text file and converting them to UTF8 characters.

My question is, what if one of the characters being read cannot be converted to utf8, what happens? Will there be an exception? or will get the character get dropped off?

like image 892
Rafael Avatar asked Jun 27 '26 07:06

Rafael


1 Answers

You are not converting from one charset to another. You are just indicating that the file is UTF 8 encoded so that you can read it correctly.

If you want to convert from 1 encoding to the other then you should do something like below

File infile = new File("x-utf8.txt");
File outfile = new File("x-utf16.txt");

String fromEncoding="UTF-8";
String toEncoding="UTF-16";

Reader in = new InputStreamReader(new FileInputStream(infile), fromEncoding);
Writer out = new OutputStreamWriter(new FileOutputStream(outfile), toEncoding);

After going through the David Gelhar's response, I feel this code can be improved a bit. If you doesn't know the encoding of the "inFile" then use the GuessEncoding library to detect the encoding and then construct the reader in the encoding detected.

like image 159
Aravind Yarram Avatar answered Jun 28 '26 20:06

Aravind Yarram



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!