Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the correct output of sizeof("string")?

On a microcontroller, in order to avoid loading settings from a previous firmware build, I also store the compilation time, which is checked at loading.

The microcontroller project is build with 'mikroC PRO for ARM' from MikroElektronika.

Being easier to debug, I programmed the code with minGW on my PC and, after checking it left and right put, it into microC.

The code using that check failed to work properly. After an evening of frustrating debugging I, found sizeof("...") yielding different values on the two platforms and causing a buffer overflow as a consequence.

But now I don't know whose fault is it.

To re-create the problem, use following code:

#define SAVEFILECHECK_COMPILE_DATE __DATE__ " " __TIME__

char strA[sizeof(SAVEFILECHECK_COMPILE_DATE)];
char strB[] = SAVEFILECHECK_COMPILE_DATE;

printf("sizeof(#def): %d\n", (int)sizeof(SAVEFILECHECK_COMPILE_DATE));
printf("sizeof(strA): %d\n", (int)sizeof(strA));
printf("sizeof(strB): %d\n", (int)sizeof(strB));

On MinGW it returns (as expected):

sizeof(#def): 21
sizeof(strA): 21
sizeof(strB): 21

However, on 'mikroC PRO for ARM' it returns:

sizeof(#def): 20
sizeof(strA): 20
sizeof(strB): 21

This difference caused a buffer overflow down the line (overwriting byte zero of a pointer – ouch).

21 is the answer I expect: 20 chars and the '\0' terminator.

Is this one of the 'it depends' things in C or is there a violation of the sizeof operator behavior?

like image 328
cFsichb Avatar asked Aug 30 '25 14:08

cFsichb


2 Answers

This is all 100% standardized. C17 6.10.8.1:

__DATE__ The date of translation of the preprocessing translation unit: a character string literal of the form "Mmm dd yyyy" ... and the first character of dd is a space character if the value is less than 10.
...
__TIME__ The time of translation of the preprocessing translation unit: a character string literal of the form "hh:mm:ss"

  • "Mmm dd yyyy" = 11
  • "hh:mm:ss" = 8
  • " " (the space you used for string literal concatenation) = 1
  • Null termination = 1

11 + 8 + 1 + 1 = 21

As for sizeof, a string literal is an array. Whenever you pass a declared array to sizeof, the array does not "decay" into a pointer to the first element, so sizeof will report the size of the array in bytes. In case of string literals, this includes the null termination, C17 6.4.5:

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.

(Translation phase 6 is also mentioned, which is the string literal concatenation phase. I.e string literal concatenation is guaranteed to happen before null termination is added.)

So it would appear that mikroC PRO is non-conforming/bugged. There's lots of questionable embedded systems compilers out there for sure.

like image 85
Lundin Avatar answered Sep 02 '25 13:09

Lundin


Is this one of the 'it depends' things in C or is there a violation of the sizeof operator behavior?

The behavior is fully defined in the C Standard. Below are the relevant quotes from the C99 published standard, which were identical except for the section numbers in the C90 (ANSI C) version and have not been modified in essence in more recent version up to and including the upcoming C23 version:

The __DATE__ and __TIME__ macros are specified by

6.10.8 Mandatory macros

__DATE__ The date of translation of the preprocessing translation unit: a character string literal of the form "Mmm dd yyyy", where the names of the months are the same as those generated by the asctime function, and the first character of dd is a space character if the value is less than 10. If the date of translation is not available, an implementation-defined valid date shall be supplied.
__TIME__ The time of translation of the preprocessing translation unit: a character string literal of the form "hh:mm:ss" as in the time generated by the asctime function. If the time of translation is not available, an implementation-defined valid time shall be supplied.

From the above, if the time of translation is available, the macro SAVEFILECHECK_COMPILE_DATE expands to 3 string literals for a total of 11+1+8 = 20 characters, hence 21 bytes including the null terminator. If the time of translation is not available, implementation defined valid dates and times must be used, hence the behavior must be the same.

5.1.1.2 Translation phases

  1. Adjacent string literal tokens are concatenated.
  2. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.

Hence the fact that the argument to sizeof be made of 3 adjacent string literals is irrelevant, all occurrences of the sizeof operator in your examples get a single string literal argument in phase 7, then

6.5.3.4 The sizeof operator

4  When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1. When applied to an operand that has array type, the result is the total number of bytes in the array.

Therefore all 3 outputs in your example must show 21 bytes. You have found a bug in the mikroc compiler: you should report it and find a work around for your current projects.

like image 40
chqrlie Avatar answered Sep 02 '25 12:09

chqrlie