Why don't C compilers have an option ( I said an option, there will be cases where you don't want to do this ) to transform code like this:
char a1[8];
int main( int argc, char *argv[] )
{
char a2[16];
char *p = (char *)malloc( 24 );
int argv1_len = strlen( argv[1] );
memcpy( a1, argv[1], argv1_len );
memcpy( a2, argv[1], argv1_len );
memcpy( p, argv[1], argv1_len );
return 0;
}
into this:
char a1[8];
addAddr( a1, sizeof( a1 ) ); // build database of addresses and their lengths
int main( int argc, char *argv[] )
{
char a2[16];
addAddr( a2, sizeof( a2 ) );
char *p = (char *)malloc( 24 );
int argv1_len = strlen( argv[1] );
addAddr( p, 24 );
ptrCheck( a1, argv1_len ); // exit if argv1_len > size of a1
memcpy( a1, argv[1], argv1_len );
ptrCheck( a2, argv1_len );
memcpy( a2, argv[1], argv1_len );
ptrCheck( p, argv1_len );
memcpy( p, argv[1], argv1_len );
ptrCheck( p+5, argv1_len );
memcpy( p+5, argv[1], argv1_len );
return 0;
}
Doesn't the C compiler have enough information about the memory layout of locals and globals that it could build up a database of memory locations, either at compile time or put in code during run time, and their lengths and on any calls to strcpy, memcpy, memset, etc. or even code that does an assignment like *ch1 = *ch2; it could check the memory and make sure it is in bounds? I assume there will be cases that this won't catch and that there will be a performance penalty that could be dealt with by turning this feature on or off completely or perhaps even by line or section of code and recompiling. This is kind of like valgrind but better and with the compiler's help instead of just relying on the binary and only checking the heap.
Or even make the checkPtr API available to the developer so I could write my own strcpy:
char *mystrcpy( char *dst, const char *src )
{
if ( checkPtr( dst, strlen( src ) ) )
{ /* do something custom */ }
return strcpy( dst, src );
}
Relatively new versions of compilers have options to enable such checking to some extend.
For example, here is the documentation for clang's address sanitizer.
You can enable them by compiling with -fsanitize=address (gcc and clang).
Clang (and I believe newer versions of gcc too) also includes sanitizers for undefined behavior (-fsanitize=undefined), uninitialized read (-fsanitize=memory) and data races (-fsanitize=thread).
In the embedded world, compilers and tool chains commonly exist with options to perform various non-standard checks: NULL pointer dereferences, buffer overflows, etc. As you might guess, these features are computationally expensive (negatively impacting timing and performance), induce bloat, add to compile time, among other potentially unwanted effects. For these reasons, I've seen these "safe" compilation options enabled only during development/debugging (much like one would apply a static source code checker). I've rarely see released code ship with this stuff enabled.
Since I've mentioned static source code analyzers, I recommend taking a look at Coverity, Code Sonar, and others. In my experience, these tools do a much better job at detecting unsafe code than the usual compiler equipped with such checkers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With