I'm using an Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz and wondering why the multiplication of 64 bit numbers is slower than that of 32 bit numbers. I've done a test run in C and it turns out it needs twice as much time.
I expected it to need the same amount of time since the CPU works with native 64 bit registers and it shouldn't matter how wide the numbers are (as long as they fit into a 64 bit register).
Can someone explain this?
There are specialized instructions in the x86-64 instruction set to express that you only want to multiply two 32-bit
quantities. One instruction may look like IMUL %EBX, %ECX
in a particular dialect for the x86-64 assembly, as opposed to the 64-bit multiplication IMUL %RBX, %RCX
.
So the processor knows that you only want to multiply 32-bit quantities. This happens often enough that the designers of the processor made sure that the internal circuitry would be optimized to provide a faster answer in this easier case, just as it is easier for you to multiply 3-digit numbers than 6-digit numbers. The difference can be seen in the timings measured by Agner Fog and described in his comprehensive assembly optimization resources.
If your compiler is targeting the older 32-bit IA-32 instruction set, then the difference between 32-bit and 64-bit multiplication is even wider. The compiler has to implement 64-bit multiplication with only instructions for 32-bit multiplication, using four of them (three if computing only the 64 least significant bits of the result). 64-bit multiplication can be about three-four times slower than 32-bit multiplication in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With