I have a table like this
CREATE TABLE `mb1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `u_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And I insert tow rows
insert into mb1(name) values('K'),('K');
Note, the second K is unicode character
+------+-----------+
| name | hex(name) |
+------+-----------+
| K | 4B |
| K | EFBCAB |
+------+-----------+
Why do they cause unique key collision? Aren't they different character in utf8mb4?
After remove COLLATE utf8mb4_unicode_ci, the problem disappeared.
Why do they cause unique key collision? Aren't they different character in utf8mb4?
You are missing the point about CHARACTER SET and COLLATION. A CHARACTER SET is a collection of different characters. A COLLATION says whether to treat the characters as equal -- think A and a -- different characters, but treated for ORDER BY and WHERE =, etc as being the same.
mysql> SELECT 'K'='K' COLLATE utf8_unicode_ci;
+-----------------------------------+
| 'K'='K' COLLATE utf8_unicode_ci |
+-----------------------------------+
| 1 |
+-----------------------------------+
So in utf8_unicode_ci (or utf8mb4_unicode_ci), those two characters are considered to be "equal".
"Equal" is the test for UNIQUE keys.
Set the COLLATION for the column to whatever makes sense for you.
utf8mb4_latvian_ci is a little different: K=k but not equal to Ķ=ķ . There are other specialized collations for other languages (mostly Western European).
Your K is called "FULLWIDTH LATIN CAPITAL LETTER K", so it is quite reasonable that it compare equal to Latin K.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With