I'm in the process of optimizing a simple genetic algorithm and neural network, and I'm fiddling with some options in GCC to generate faster executables.
In my code I have some assertions, such as
mat mat_add(mat a, mat b)
{
    assert(a->rows == b->rows);
    assert(a->cols == b->cols);
    mat m = mat_create(a->rows, a->cols);
    for(size_t i = 0; i < a->rows; i++) {
        for(size_t j = 0; j < a->cols; j++)
            mat_set(m, i, j, mat_get(a, i, j) + mat_get(b, i, j));
    }
    return m;
}
I've figured that if I added -DNDEBUG to disable the assertions, the executable would be faster because it wouldn't check the conditions above. However, it is actually slower.
Without -DNDEBUG:
gcc src/*.c -lm -pthread -Iinclude/ -Wall -Ofast 
for i in $(seq 1 5); do time ./a.out; done
Output:
real    0m11.677s
user    1m28.786s
sys     0m0.729s
real    0m11.716s
user    1m29.304s
sys     0m0.723s
real    0m12.217s
user    1m31.707s
sys     0m0.806s
real    0m12.602s
user    1m32.863s
sys     0m0.726s
real    0m12.225s
user    1m30.915s
sys     0m0.736s
With -DNDEBUG:
gcc src/*.c -lm -pthread -Iinclude/ -Wall -Ofast -DNDEBUG 
for i in $(seq 1 5); do time ./a.out; done
Output:
real    0m13.698s
user    1m42.533s
sys     0m0.792s
real    0m13.764s
user    1m43.337s
sys     0m0.709s
real    0m13.655s
user    1m42.986s
sys     0m0.739s
real    0m13.836s
user    1m43.138s
sys     0m0.719s
real    0m14.072s
user    1m43.879s
sys     0m0.712s
It isn't much slower, but it is noticeable.
What could be causing this slowdown?
Do the mat_set and mat_get functions perform their own bounds checks on the indices? With the asserts present, the loop is only reachable if b->rows == a->rows is true. That allows the compiler to optimize out any check i < b->rows in the mat_get for b, because it knows b->rows == a->rows and i < a->rows by the loop condition.
If this ends up being the case, you could achieve the same without assertions, and without any runtime branch, by adding (GNU C feature):
if (a->rows != b->rows || a->cols != b->cols)
    __builtin_unreachable();
A more portable but less reliable way to do this is just write some nonsensical undefined behavior like 1/0; in the if body.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With