Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is this crazy German character combination to represent an umlaut?

I was just parsing the following website.

There one finds the text

und wären damit auch

At first, the "ä" looks perfectly fine, but once I inspect it, it turns out that this is not the regular "ä" (represented as ascw 228) but this:

ascw: 97, char: a
ascw: 776, char: ¨

I have never before seen an "ä" represented like this.

How can it happen that a website uses this weird character combination and what might be the benefit from it?

like image 872
tmighty Avatar asked Oct 28 '25 04:10

tmighty


1 Answers

What you don't mention in your questions is the used encoding. Quite obviously it is a Unicode based encoding.

In Unicode, code point U+0308 (776 in decimal) is the combining diaeresis. Out of the letter a and the diaeresis, the German character ä is created.

There are indeed two ways to represent German characters with umlauts (ä in this case). Either as a single code point:

U+00E4 latin small letter A with diaeresis

Or as a sequence of two code points:

U+0061 latin small letter A
U+0308 combining diaeresis

Similarly you would combine two code points for an upper case 'Ä':

U+0041 latin capital letter A
U+0308 combining diaeresis

In most cases, Unicode works with two codes points as it requires fewer code points to enable a wide range characters with diacritics. However for historical reasons a special code point exist for letters with German umlauts and French accents.

The Unicode libraries is most programming languages provide functions to normalize a string, i.e. to either convert all sequences into a single code point if possible or extend all single code points into the two code point sequence. Also see Unicode Normalization Forms.

like image 182
Codo Avatar answered Oct 30 '25 16:10

Codo