I have run into an extremely frustrating feature, for some reason a query involving select LIKE %D% will not match Đ.
All other characters that are like this do match, however. %n% matches ñ, %o% matches ồ, but if I am searching for %Dong Nai% I will not get Đồng Nai.
Although %Thua Thien-Hue% will match Thừa Thiên-Huế
Is this a MySQL feature or something hard coded into Unicode, or is there a way around this? It makes people who are using my website unable to find events about certain Vietnamese provinces, unless they have access to the Đ key, which virtually nobody in America does.
EDIT:
The fact that a, e, i, o, or u matches all Vietnamese vowels is very unexpected behavior to a Vietnamese speaker.
For reference; here are all the Vowels in Vietnamese.
à, á, ã̉, ạ, a, ằ, ắ, ẵ, ẳ, ặ, ă, ầ, ấ, ẫ, ẩ, ậ, â, è, é, ẽ, ẻ, ẹ, e, ề, ế, ễ, ể, ệ, ê, ì, í, ĩ, ỉ, ị, i, ò, ó, õ, ỏ ,ọ, o, ồ, ố ,ỗ, ổ, ộ ,ô, ờ, ớ ,ỡ, ở, ợ, ơ, ù, ú, ũ, ủ, ụ, u, ừ, ứ , ữ , ử, ự, ư
My question is then, 'What constitutes a different enough letter?'.
It appears other Vietnamese speakers have reported this as a bug to MySQL:
This behavior appears to not be present in 5.6+. I will let you know if an update of MySQL helps.
http://bugs.mysql.com/bug.php?id=61258
It is to do with the collation. Check out http://www.collation-charts.org/mysql60/ and you will see that D and the character Đ are not the same when it comes to comparison. As suggested by nico in the comments the easiest (although not the fastest) way round this would be to replace Đ with D when doing the comparison. However this may not be practical depending on your performance criteria in which case you may want to keep a separate column or table of content that has been adjusted to replace certain characters at the time of inserting the data into the database.
Those Vietnamese vowels and their diacritical variants are primarily equal (have the same base character) but are secondarily (diacritical) different. Using the appropriate collation can make them equal.
However, this is different for 'D' and 'Đ' as they are not related characters and not equal by any collation rules. As such, compare to both letters is required.
Implement Vietnamese Collation in MySQL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With