Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to refer to a Chinese character in C code

I have a C program that currently reads in Chinese text and stores them as type wchar_t. What I want to do is look for a specific character in the text, but I am not sure how to refer to the character in the code.

I essentially want to say:

wchar_t character;

if (character == 个) {
    return 1;
}

else return 0;

Some logic has been omitted, obviously. How would I go about performing such logic on Chinese in C?

Edit: Got it to work. This code compiles with -std=c99, and prints out the character "个".

1 #include <locale.h>
2 #include <stdio.h>
3 #include <wchar.h>
4 
5 
6 int main() {
7         wchar_t test[] = L"\u4E2A";
8         setlocale(LC_ALL, "");
9         printf("%ls", test);
10 }
like image 241
Alex Hansen Avatar asked Oct 28 '25 09:10

Alex Hansen


1 Answers

Depending on your compiler, if it allows source in a supported Unicode encoding, you can just compare against the actual symbol, otherwise, you can use a wide character constant:

#include <stdio.h>

int main()
{
    int i;
    wchar_t chinese[] = L"我不是中国人。";
    for(i = 0; chinese[i]; ++i)
    {
        if(chinese[i] == L'不')
            printf("found\n");
        if(chinese[i] == L'\u4E0D')
            printf("also found\n");
    }
}

Note a wide character string is L"xxx" while a wide character is L'x'. A Unicode BMP code point can be specified with \uXXXX.

FYI, I compiled with Visual Stdio 2012 with source encodings of UTF-8 with BOM, UTF-16 (little endian) and UTF-16 (big endian). UTF-8 without BOM did not work.

like image 98
Mark Tolonen Avatar answered Oct 31 '25 01:10

Mark Tolonen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!