Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort respecting diacritics (PostgreSQL)

Can I get PostgreSQL to sort rows by a string column respecting the accents?

I found out that it's possible to define a custom collation having "ks" (colStrength) set to "level2", which would mean that it's accent-sensitive.

However, when I try to actually sort using that collation, the order seem to be accent-insensitive.

There is an extensive blog post about this by a PostgreSQL developer, let's use the same ICU locale) like so:

CREATE TABLE test (string text);
INSERT INTO test VALUES ('bar'), ('bat'), ('bär');
CREATE COLLATION "und1" (provider = icu, deterministic = false, locale = 'und-u-ks-level1');
CREATE COLLATION "und2" (provider = icu, deterministic = false, locale = 'und-u-ks-level2');
CREATE COLLATION "und3" (provider = icu, deterministic = false, locale = 'und-u-ks-level3');
SELECT * FROM test ORDER BY string collate "und1";
SELECT * FROM test ORDER BY string collate "und2";
SELECT * FROM test ORDER BY string collate "und3";

All three collations give me the same order: bar < bär < bat, although an accent-sensitive order would be bar < bat < bär

Do I misunderstand the collation capabilities? Is there a way to get an accent-sensitive order?

Also, is there a way to see what options are there for the default built-in collations? I don't see, for example, the used "ks" level in the pg_collation table data.

like image 433
Jānis Elmeris Avatar asked Oct 15 '25 03:10

Jānis Elmeris


1 Answers

Yes, PostgreSQL can sort strings accent-sensitively using ICU collations, but there are a few important nuances to get it working correctly.

The issue

You're correctly using ICU collations with ks=level2, which should enable accent-sensitive comparisons. However, the und locale (undetermined language) may not provide the sorting behavior you're expecting. ICU needs a language context to apply proper collation rules.

✅ Solution

Instead of using und, try using a real language locale, such as en-u-ks-level2 for English or fr-u-ks-level2 for French, depending on the language context of your data.

CREATE COLLATION "en_level2" (provider = icu, deterministic = false, locale = 'en-u-ks-level2');

SELECT * FROM test ORDER BY string COLLATE "en_level2";

CREATECOLLATION "en_level2" (provider = icu, deterministic = false, locale = 'en-u-ks-level2'); SELECT * FROM test ORDER BY string COLLATE "en_level2";

This should result in the expected order: bar < bat < bär.

🔍 Why und doesn’t work

The und locale often defaults to root collation rules, which may not define strong enough rules for distinguishing accents. Using a specific language gives ICU more context for handling accent-sensitive and locale-specific rules.

ℹ️ Check available collations

You can list all available ICU collations with:

SELECT * FROM pg_collation WHERE provider = 'icu';

SELECT* FROM pg_collation WHERE provider = 'icu';

Unfortunately, the pg_collation catalog does not expose the ICU options like ks, but you can infer them from the locale field.

like image 200
Mohammed Abd Alrazaq Avatar answered Oct 18 '25 01:10

Mohammed Abd Alrazaq



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!