Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why don't define some undefined behaviours?

What are the reasons for C++ to not define some behavior (something like better error checking)? Why don't throw some error and stop?

Some pseudocodes for example:

if (p == NULL && op == deref){
    return "Invalid operation"
 }

For Integer Overflows:

if(size > capacity){
    return "Overflow"
}

I know these are very simple examples. But I'm pretty sure most UBs can be caught by the compiler. So why not implement them? Because it is really time expensive and not doing error checking is faster? Some UBs can be caught with a single if statement. So maybe speed is not the only concern?

like image 705
merovingian Avatar asked Oct 16 '25 02:10

merovingian


2 Answers

Because the compiler would have to add these instructions every time you use a pointer. A C++ program uses a lot of pointers. So there would be a lot of these instructions for the computer to run. The C++ philosophy is that you should not pay for features you don't need. If you want a null pointer check, you can write a null pointer check. It's quite easy to make a custom pointer class that checks if it is null.

On most operating systems, dereferencing a null pointer is guaranteed to crash the program, anyway. And overflowing an integer is guaranteed to wrap around. C++ doesn't only run on these operating systems - it also runs on other operating systems where it doesn't. So the behaviour is not defined in C++ but it is defined in the operating system.


Nonetheless, some compiler writers realized that by un-defining it again, they can make programs faster. A surprising amount of optimizations are possible because of this. Sometimes there's a command-line option which tells the compiler not to un-define the behaviour, like -fwrapv.

A common optimization is to simply assume the program never gets to the part with the UB. Just yesterday I saw a question where someone had written an obviously not infinite loop:

int array[N];
// ...
for(int i = 0; i < N+1; i++) {
    fprintf(file, "%d ", array[i]);
}

but it was infinite. Why? The person asking the question had turned on optimization. The compiler can see that there is UB on the last iteration, since array[N] is out of bounds. So it assumed the program magically stopped before the last iteration, so it didn't need to check for the last iteration and it could delete the part i < N+1.

Most of the time this makes sense because of macros or inline functions. Someone writes code like:

int getFoo(struct X *px) {return (px == NULL ? -1 : px->foo);}

int blah(struct X *px) {
    bar(px->f1);
    printf("%s", px->name);
    frobnicate(&px->theFrob);
    count += getFoo(px);
}

and the compiler can make the quite reasonable assumption that px isn't null, so it deletes the px == NULL check and treats getFoo(px) the same as px->foo. That's often why compiler writers choose to keep it undefined even in cases where it could be easily defined.

like image 144
user253751 Avatar answered Oct 17 '25 18:10

user253751


Compilers already have switches to enable those checks (those are called sanitizers, e.g. -fsanitize=address, -fsanitize=undefined in GCC and Clang).

It doesn't make sense for the standard to always require those checks, because they harm performance, so you might not want them in release builds.

like image 38
HolyBlackCat Avatar answered Oct 17 '25 18:10

HolyBlackCat