Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given a language, how to get its alphabet letters

Is there a programmatic way (or some open-source repository), that given a language (say in 2-leters ISO format), return the letters of the alphabet of that language?

For example:

console.log(getAlphabet('en'));

outputs:

a b c d ... 

and

console.log(getAlphabet('he'));

outputs:

א ב ג ד ... 
like image 961
SaguiItay Avatar asked Oct 15 '25 04:10

SaguiItay


1 Answers

I don't think that a language always has a well-defined alphabet associated with it. But in the Unicode CLDR standard, the //ldml/characters/exemplarCharacters seem to contain a "representative section" of letters typically used in a given language. This comes in an open-source repository, see here for Hebrew, for example.

Using an XML parser library, you can write a function that loads the file based on the language code (in the example above, https://raw.githubusercontent.com/unicode-org/cldr/HEAD/common/main/he.xml for language code he) and locates the //ldml/characters/exemplarCharacters element in it.

Below is an example function in client-side Javascript. It uses a regular expression with Unicode flag to split the exemplarCharacters into individual letters, even if they are represented by more than one Javascript character.

fetch("https://raw.githubusercontent.com/unicode-org/cldr/HEAD/common/main/he.xml")
  .then(r => r.text())
  .then(function(xml) {
    var dom = new DOMParser().parseFromString(xml, "text/xml");
    console.log(dom.evaluate("/ldml/characters/exemplarCharacters[1]", dom, undefined, XPathResult.STRING_TYPE).stringValue
    .match(/[^ \[\]]/gu));
  });

Alternatively, you could evaluate /ldml/characters/exemplarCharacters[@type='index'].

like image 62
Heiko Theißen Avatar answered Oct 17 '25 17:10

Heiko Theißen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!