Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get MySQL to match `%D%` => `Đ`

I have run into an extremely frustrating feature, for some reason a query involving select LIKE %D% will not match Đ.

All other characters that are like this do match, however. %n% matches ñ, %o% matches , but if I am searching for %Dong Nai% I will not get Đồng Nai.

Although %Thua Thien-Hue% will match Thừa Thiên-Huế

Is this a MySQL feature or something hard coded into Unicode, or is there a way around this? It makes people who are using my website unable to find events about certain Vietnamese provinces, unless they have access to the Đ key, which virtually nobody in America does.

EDIT:

The fact that a, e, i, o, or u matches all Vietnamese vowels is very unexpected behavior to a Vietnamese speaker.

For reference; here are all the Vowels in Vietnamese.

à, á, ã̉, ạ, a, ằ, ắ, ẵ, ẳ, ặ, ă, ầ, ấ, ẫ, ẩ, ậ, â, è, é, ẽ, ẻ, ẹ, e, ề, ế, ễ, ể, ệ, ê, ì, í, ĩ, ỉ, ị, i, ò, ó, õ, ỏ ,ọ, o, ồ, ố ,ỗ, ổ, ộ ,ô, ờ, ớ ,ỡ, ở, ợ, ơ, ù, ú, ũ, ủ, ụ, u, ừ, ứ , ữ , ử, ự, ư


My question is then, 'What constitutes a different enough letter?'.


It appears other Vietnamese speakers have reported this as a bug to MySQL:

This behavior appears to not be present in 5.6+. I will let you know if an update of MySQL helps.

http://bugs.mysql.com/bug.php?id=61258

like image 893
Ryan Ward Avatar asked Jan 17 '26 00:01

Ryan Ward


2 Answers

It is to do with the collation. Check out http://www.collation-charts.org/mysql60/ and you will see that D and the character Đ are not the same when it comes to comparison. As suggested by nico in the comments the easiest (although not the fastest) way round this would be to replace Đ with D when doing the comparison. However this may not be practical depending on your performance criteria in which case you may want to keep a separate column or table of content that has been adjusted to replace certain characters at the time of inserting the data into the database.

like image 170
dunos Avatar answered Jan 19 '26 14:01

dunos


Those Vietnamese vowels and their diacritical variants are primarily equal (have the same base character) but are secondarily (diacritical) different. Using the appropriate collation can make them equal.

However, this is different for 'D' and 'Đ' as they are not related characters and not equal by any collation rules. As such, compare to both letters is required.

Implement Vietnamese Collation in MySQL

like image 44
nguyenq Avatar answered Jan 19 '26 14:01

nguyenq



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!