When compiling the benchmark code below with -O3 I was impressed by the difference it made in latency so i began to wonder whether the compiler is not "cheating" by removing code somehow. Is there a way to check for that? Am I safe to benchmark with -O3? Is it realistic to expect 15x gains in speed?
Results without -O3: Average: 239 nanos Min: 230 nanos (9 million iterations)
Results with-O3: Average: 14 nanos, Min: 12 nanos (9 million iterations)
int iterations = stoi(argv[1]);
int load = stoi(argv[2]);
long long x = 0;
for(int i = 0; i < iterations; i++) {
    long start = get_nano_ts(); // START clock
    for(int j = 0; j < load; j++) {
        if (i % 4 == 0) {
            x += (i % 4) * (i % 8);
        } else {
            x -= (i % 16) * (i % 32);
        }
    }
    long end = get_nano_ts(); // STOP clock
    // (omitted for clarity)
}
cout << "My result: " << x << endl;
Note: I am using clock_gettime to measure:
long get_nano_ts() {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec * 1000000000 + ts.tv_nsec;
}
The compiler will certainly be "cheating" and removing unnecessary code when compiling with optimization enabled. It actually goes great length to speed up your code which almost always will lead to impressive speed-ups. If it was somehow able to derive a formula that calculates the result in constant time instead of using this loop, it would. A constant factor 15 is nothing out of the ordinary.
But this does not mean that you should profile un-optimized builds! Indeed, when using languages like C and C++, the performance of un-optimized builds is pretty much completely meaningless. You need not worry about that at all.
Of course, this can interfere with micro-benchmarks as the one you showed above. Two points to that:
Since you seem to be doing that, the code you show has a good chance of being a reasonable micro benchmark. One thing you should watch out for is whether your compiler moves both calls to get_nano_ts(); to the same side of the loop. It is allowed to do this since "run time" does not count as observable side effect. (The standard does not even mandate your machine operating at finite speed.) It was argued here that this usually is not a problem, though I cannot really judge whether the answer given is valid or not.
If your program does not do anything expensive other then the thing you want to benchmark (which it, if possible, should not do anyways), you can also move the time measurement "outside", e.g. with time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With