Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does casting from Char to Int always give positive values in C

I was writing production-ready C where I need to find the frequency of characters in a char array pretty fast. I was trying remove an assert call to check for positive values during a cast. Is my assert redundant code or is it necessary?

    char input[] = "Hello World";
    int inputLength = sizeof(input)/ sizeof(char);
    int *frequencies = calloc(256, sizeof(int));
    for(int i = 0; i < inputLength-1; i++)
    {
        int value = (int) input[i];
        assert(value > -1);//Is this line redundant?
        frequencies[value] += 1;
    }
    printf("(%d)", inputLength);
    PrintFrequencies(frequencies);
    free(frequencies);
like image 256
murage kibicho Avatar asked Dec 13 '25 11:12

murage kibicho


2 Answers

Does casting from Char to Int always give positive values in C

Generally speaking, no. char may be either a signed or an unsigned type, at the C implementation's discretion, but pretty frequently it is a signed type.

All char values representing members of the basic execution character set are guaranteed to be non-negative. This includes the upper- and lowercase Latin letters, the decimal digits, a variety of punctuation, the space character and a few control characters. The char values representing other characters may be negative, however. Also, the multiple char values constituting the representation of a multi-byte character can include some that, considered as individual chars, are negative.

I was writing production-ready C where I need to find the frequency of characters in a char array pretty fast. I was trying remove an assert call to check for positive values during a cast. Is my assert redundant code or is it necessary?

Your assert() is semantically wrong. If you're reading arbitrary text and you want your program to be robust, then you do need to be prepared for chars with negative values. But

  1. assertions are the wrong tool for this job. Assertions are for checking that the invariants your program assumes in fact hold. You might use an assertion if you (thought you) had a guarantee that char values were always non-negative, for example. If an assertion ever fails, it means your code is wrong.

    You must never use an assertion to validate input data or perform any other test that your program relies upon being performed, because depending on how you compile the program, the asserted expression might not be evaluated at all.

  2. It would be better for your program to handle negative char values if they are encountered than to fail. In this regard, do note that there's no particular use in converting your char explicitly to int. You can use a char directly anywhere where you want an integer. On the other hand, it might make sense to cast to unsigned char, as that will be cheap -- possibly free, even if char is signed -- and it will take care of your signedness problem.

like image 179
John Bollinger Avatar answered Dec 15 '25 01:12

John Bollinger


Formally, there is no requirement that the values of the character sets supports by a standard C compiler have any particular value, positive or negative.

The char type can either by signed or unsigned: Is char signed or unsigned by default?. In situations where it is signed and some "extended character set" is implemented (beyond classic "7 bit ASCII" for example), then strings could in theory hold negative values.

So depending on how portable you need to code to be, there might be a place for the assert. However as mentioned in comments, casting to an unsigned type instead removes the problem. Consider using this instead:

uint8_t value = input[i];

Now value is guaranteed to be in the range of 0 - 255 and the code is portable.

like image 21
Lundin Avatar answered Dec 15 '25 01:12

Lundin