int test1(int a, int b) {
if (__builtin_expect(a < b, 0))
return a / b;
return b;
}
was compiled by clang with -O3 -march=native to
test1(int, int): # @test1(int, int)
cmp edi, esi
jl .LBB0_1
mov eax, esi
ret
.LBB0_1:
mov eax, edi
cdq
idiv esi
mov esi, eax
mov eax, esi # moving eax back and forth
ret
why eax is being moved back and forth after the idiv ?
gcc has a similar behavior so this seem to be intended.
gcc with -O3 -march=native complied the code to
test1(int, int):
mov r8d, esi
cmp edi, esi
jl .L4
mov eax, r8d
ret
.L4:
mov eax, edi
cdq
idiv esi
mov r8d, eax
mov eax, r8d #back and forth mov
ret
godbolt
This is not a complete solution to the puzzle but should give some clues.
Without the __builtin_expect, clang generates:
test2(int, int): # @test2(int, int)
mov ecx, esi
cmp edi, esi
jge .LBB1_2
mov eax, edi
cdq
idiv ecx
mov ecx, eax
.LBB1_2:
mov eax, ecx
ret
While the register allocation is still weird here, it at least makes sense: if the branch is taken, the value of b in ecx is transfered to eax as the return value. If it is not taken, the result of the division (in eax) has to be transferred to ecx to be in the same register as in the other case.
It could be that a __builtin_expect convinces the compiler to special case the case where the branch is taken late in the compilatin process, orphaning the .LBB1_2 label and causing it to be ultimately absent from the assembly.
idiv esi is 32-bit operand-size, so EAX is already zero-extended to fill RAX. Therefore copying to ESI or R8D and back has no effect on the value in EAX. (And the calling convention doesn't require zero-extension or sign-extension to 64-bit anyway; 32-bit types are returned in 32-bit registers with possible garbage in the upper 32.)
This looks like purely a missed optimization. (There's no microarchitectural performance reason that this would be a good thing either.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With