From reading this I came across the next two quotes:
First quote:
A typical case of unpredictable branch behavior is when the comparison result is dependent on data.
Second quote:
No Branches Means No Mispredicts
For my project, I work on a dependent data and I perform many if and switch statements. My project is related to Big Data so it has to be as efficient as possible. So I wanted to test it on the data provided by user, to see if branch prediction actually slows down my program or helps. As from reading here:
misprediction delay is between 10 and 20 clock cycles.
What shocked me most was:
Removing the branches not only improves runtime performance of the code, it also helps the compiler to optimize the code.
Why use branch prediction then ?
Is there a way to force the compiler to generate assembly code without branches ? or to disable branch prediction so that CPU? so I can compare both results ?
I believe the most common way to avoid branching is to leverage bit parallelism in reducing the total jumps present in your code. The longer the basic blocks, the less often the pipeline is flushed.
Once the branch is decided, if the prediction was correct nothing happens but if the prediction was wrong, the pipeline simply switches to processing the correct instruction at the next clock.
When the prediction is right, that is, when the branch is not taken, there is no penalty to be paid. On the other hand, when the prediction is wrong, one bubble is created and the next instruction is fetched from the target address.
One thing you can do in a high-level language is to eliminate branches by expressing the problem in terms of lookups or arithmetic. This helps branch prediction work better on the remaining branches, because there's more "history" available.
to see if branch prediction actually slows down my program or helps
Branch prediction doesn't slow down programs. When people talk about the cost of missed predictions, they're talking about how much more expensive a mispredicted branch is compared to a correctly predicted branch.
If branch prediction didn't exist, all branches would be as expensive as a mispredicted one.
So what "misprediction delay is between 10 and 20 clock cycles" really means is that successful branch prediction saves you 10 to 20 cycles.
Removing the branches not only improves runtime performance of the code, it also helps the compiler to optimize the code.
Why use branch prediction then ?
Why use branch prediction over removing branches? You shouldn't. If a compiler can remove branches, it will (assuming optimizations are enabled), and if programmers can remove branches (assuming it doesn't harm readability or it's a performance-critical piece of code), they should.
That hardly makes branch prediction useless though. Even if you remove as much branches as possible from a program, it will still contain many, many branches. So because of this and because of how expensive unpredicted branches are, branch prediction is essential for good performance.
Is there a way to force the compiler to generate assembly code without branches ?
An optimizing compiler will already remove branches from a program when it can (without changing the semantics of the program), but, unless we're talking about a very simple int main() {return 0;}-type program, it's impossible to remove all branches. Loops require branches (unless they're unrolled, but that only works if you know the number of iterations ahead of time) and so do most if- and switch-statements. If you can minimize the number of ifs, switches and loops in your program, great, but you won't be able to remove all of them.
or to disable branch prediction so that CPU? so I can compare both results ?
To the best of my knowledge it is impossible to disable branch prediction on x86 or x86-64 CPUs. And as I said, this would never improve performance (though it might make it predictable, but that's not usually a requirement in the contexts where these CPUs are used).
Modern processors have pipelines which allow the CPU to work a lot faster than it would be able to otherwise. This is a form of parallelism where it starts processing an instruction a few clock cycles before the instruction is actually needed. See here here for more details.
This works great until we hit a branch. Since we are jumping, the work that is in the pipeline is no longer relevant. The CPU then needs to flush the pipeline and restart again. This causes a delay of a few clock cycles until the pipeline is full again. This is known as a pipeline stall.
Modern CPUs are clever enough when it comes to unconditional jumps to follow the jump when filling the pipeline thus preventing the stall. This does not work when it comes to branching since the CPU does not know exactly where the jump is going to go.
Branch Prediction tries to solve this problem by making a guess as to which branch the CPU will follow before fully evaluating the jump. This (when it works) prevents the stall.
Since almost all programming involves making decisions, branching is unavoidable. But one certainly can write code with fewer branches and thus lessen the delays caused by misprediction. Once we are branching, branch prediction at least allows us a chance of getting things right and not having a CPU pipeline stall.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With