On most platforms, alloca just boils down to an inline adjustment of the stack pointer (for example, subtracting from rsp on x64, plus a bit of logic to maintain stack alignment).
I was looking at the code that gcc generates for alloca and it is pretty weird. Take the following simple example1:
#include <alloca.h>
#include <stddef.h>
volatile void *psink;
void func(size_t x) {
psink = alloca(x);
}
This compiles to the following assembly at -O2:
func(unsigned long):
push rbp
add rdi, 30
and rdi, -16
mov rbp, rsp
sub rsp, rdi
lea rax, [rsp+15]
and rax, -16
mov QWORD PTR psink[rip], rax
leave
ret
There are several confusing things here. I understand that gcc needs to round the allocated size up to a multiple of 16 (to maintain stack alignment), and the usual way to do that would be (size + 15) & ~0xF but instead it adds 30 at add rdi, 30? What's up with that?
Second, I would just expect the result of alloca to be the new rsp value, which is already well-aligned. Instead, gcc does this:
lea rax, [rsp+15]
and rax, -16
Which seems to be "realigning" the value of rsp to use as the result of alloca - but we already did the work to align rsp to a 16-byte boundary in the first place.
What's up with that?
You can play with the code on godbolt. It is worth noting that clang and icc do the "expected thing" on x86 at least. With VLAs (as suggested in earlier comments), gcc and clang does fine while icc produces an abomination.
1 Here, the assignment to psink is just to consume the result of alloca since otherwise the compiler just omits it entirely.
This is a very old, normal priority bug. The code works correctly. It's just that when the size is larger than 1 byte, 16 more bytes are unnecessarily allocated. So it's not a correctness bug, it's a minor efficiency bug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With