I am working on some application which require sorting of Japans languages.
Sorting of Japanese needs to have Katakana and Kanji converted to Hiragana and then sorted according to the UTF-8 code.
The Hiragana, Katakana, and Kanji characters shall be combined together and sorted by the Hiragana equivalent “spelling.” Note: using the Hiragana “alphabet” – a, i, u, e, o, ka, ki, ku, ke, ki, etc.
Now to do this task, I need :
1.Classify japanese characters as either kanji or Katakana or Hiragana.
2.Convert Katakana and Kanji to Hiragana .
3.Apply algorithm which carry out sorting base on phonetic sound(Hiragana).
The Database of application is in UTF-8 .
Now to carry out 1st step: "Classify japanese characters as either kanji or Katakana or Hiragana." ,
I want to know if there is any APIs present for C or C++ programing language in Sqlite3 , QT , ICU or any other package which can give Unicode of Character ?
On the Base of Unicode, we can easily classify Japanese characters.
Please correct me if I am wrong?
As you say, Japanese characters can easily be sorted into group using Unicode. This is trivial.
Conversion of katakana to hiragana is also trivial as there is a one to one mapping. You can convert kanji to hiragana via Kakasi
Sorting can be done by converting to hiragana first. However, this is a poor man's sort as many kanji are homophones (same sound, different kanji). So you should sort the Kanji before converting and sorting by hiragana.
You don't say why you need to do sorting in this way. Maybe there is a better sort we can suggest if you tell us more about your application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With