Cost of instruction jumps in assembly

Question

I've had always been curious about the cost of jumps in assembly.

cmp ecx, edx
je SOME_LOCATION # What's the cost of this jump?

Does it need to do a search in a lookup table for each jumps or how does it work?

Brendan · Accepted Answer

Originally (e.g. 8086) the cost of a jump wasn't much different to the cost of a mov.

Later CPUs added caches, which meant some jumps were faster (because the code they jump to is in the cache) and some jumps were slower (because the code they jump to isn't in the cache).

Even later CPUs added "out of order" execution, where conditional branches (e.g. je SOME_LOCATION) would have to wait until the flags from "previous instructions that happen to be executed in parallel" became known.

This means that a sequence like

mov esi, edi 
cmp ecx, edx 
je SOME_LOCATION

can be slower than rearranging it to

cmp ecx, edx 
mov esi, edi 
je SOME_LOCATION

to increase the chance that the flags would be known.

Even later CPUs added speculative execution. In this case, for conditional branches the CPU just takes a guess at where it will branch to before it actually knows (e.g. before the flags are known), and if it guesses wrong it'll just pretend that it didn't execute the wrong instructions. More specifically, the speculatively executed instructions are tagged at the start of the pipeline and held at the end of the pipeline (at retirement) until the CPU knows if they can be committed to visible state or if they have to be discarded.

After that things just got more complicated, with fancier methods of doing branch prediction, additional "branch target" buffers, etc.

Far jumps that change the code segment are more expensive. In real mode it's not so bad because the CPU mostly only does "CS.base = value * 16" when CS is changed. For protected mode it's a table lookup (to find GDT or LDT entry), decoding the entry, deciding what to do based on what kind of entry it is, then a pile of protection checks. For long mode it's vaguely similar. All of this adds more uncertainty (e.g. with the table entry be in cache?).

On top of all of this there's things like TLB misses. For example, jmp [indirectAddress] can cause a TLB miss at indirectAddress then a TLB miss at the stack top then a TLB miss at the new instruction pointer; where each TLB miss can cost a few hundred cycles.

Mostly; the cost of a jump can be anything from 0 cycles (for a correctly predicted jump) to maybe 1000 cycles; depending on which CPU it is, what kind of jump, what is in caches, what branch prediction predicts, cache/TLB misses, how fast/slow RAM is, and anything I may have forgotten.

Cost of instruction jumps in assembly

Tags:

performance

x86

assembly

einstein

1 Answers

Brendan

Recent Activity

Donate For Us

Cost of instruction jumps in assembly

Tags:

performance

x86

assembly

einstein

1 Answers

Brendan

Related questions

Recent Activity

Donate For Us