I am trying to read a UTF-8 string from my MySql database, which I create using:
CREATE DATABASE april
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
I make the table of interest using:
DROP TABLE IF EXISTS `article`;
CREATE TABLE `article` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`text` longtext NOT NULL,
`date_created` timestamp DEFAULT NOW(),
PRIMARY KEY (`id`)
) CHARACTER SET utf8;
If I select * from article in the MySql command line util, I get:
OIL sands output at Nexen’s Long Lake project dropped in February.
However, when I do
ResultSet rs = st.executeQuery(QUERY);
long id = -1;
String text = null;
Timestamp date = null;
while (rs.next()) {
text = rs.getString("text");
LOGGER.debug("text=" text);
}
the output I get is:
text=OIL sands output at Nexen’s Long Lake project dropped in February.
I get my Connection via:
DriverManager.getConnection("jdbc:" + this.dbms + "://" + this.serverHost + ":" + this.serverPort + "/" + this.dbName + "?useUnicode&user=" + this.username + "&password=" + this.password);
I've also tried, instead of the useUnicode parameter:
characterEncoding=UTF-8
and
characterEncoding=utf8
I also tried, instead of the line text = rs.getString("text")
rs.getBytes("text");
String[] encodings = new String[]{"US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE", "UTF-16LE", "UTF-16", "Latin1"};
for (String encoding : encodings) {
text = new String(temp, encoding);
LOGGER.debug(encoding + ": " + text);
}
// Which outputted:
US-ASCII: OIL sands output at Nexen��������s Long Lake project dropped in February.
ISO-8859-1: OIL sands output at Nexenââ¬â¢s Long Lake project dropped in February.
UTF-8: OIL sands output at Nexen’s Long Lake project dropped in February.
UTF-16BE: 佉䰠獡湤猠潵瑰畴琠乥硥滃ꋢ芬ꉳ⁌潮朠䱡步⁰牯橥捴牯灰敤渠䙥扲畡特�
UTF-16LE: 䥏⁌慳摮畯灴瑵愠⁴敎數썮겂蓢玢䰠湯慌敫瀠潲敪瑣搠潲灰摥椠敆牢慵祲�
UTF-16: 佉䰠獡湤猠潵瑰畴琠乥硥滃ꋢ芬ꉳ⁌潮朠䱡步⁰牯橥捴牯灰敤渠䙥扲畡特�
Latin1: OIL sands output at Nexenââ¬â¢s Long Lake project dropped in February.
I load the strings into the DB using some pre-defined sql in a file. This file is UTF-8 encoded.
mysql -u april -p -D april < insert_articles.sql
This file includes the line:
INSERT INTO article (text) value ("OIL sands output at Nexen’s Long Lake project dropped in February.");
When I print out that file within my application using:
BufferedReader reader = new BufferedReader(new FileReader(new File("/home/path/to/file/sql_article_inserts.sql")));
String str;
while((str = reader.readLine()) != null) {
LOGGER.debug("LINE: " + str);
}
I get the correct, expected output:
LINE: INSERT INTO article (text) value ("OIL sands output at Nexen’s Long Lake project dropped in February.");
Any help would be much appreciated.
Some System Details: I am running on linux (Ubuntu)
Edits:
* Edited to specify OS
* Edited to detail output of reading sql input file.
* Edited to specify more about how the data is inserted into the DB.
* Edited to to fix typo in code, and clarify example.
Is it possible you're reading the log file using the incorrect encoding? windows-1252, I am guessing.
UTF-8: OIL sands output at Nexen’s Long Lake project dropped in February.
If this is appearing in the log, do a hex dump of the log file. If the data is UTF-8, you would expect the sequence Nexen’s to become 4E 65 78 65 6E E2 80 99 73. If some other application reads this as a native ANSI encoding, it'll decode it as Nexen’s.
To confirm, you can also dump the individual characters of the return value to see if they are correct in UTF-16:
//untested
for(char ch : text.toCharArray()) {
System.out.printf("%04x%n", (int) ch);
}
I'm assuming all data is in the BMP, so you can just look up the results in the Unicode charts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With