Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does stack alignment work?

I never fully understood the difference between stack alignment in a function and "aligned loads/stores" to stack.

I'm reading some PTX code and I'm seeing this:

 function()

   .local .align 16 .byte stack_memory[200];
   // This should mean the stack memory starts at an address aligned to 16 (why would this be necessary?)

   load_byte_from_stack reg, [stack_memory+1];
   // It seems reading 1 byte is always safe (why?)

   load_float32_from_stack reg, [stack_memory+8];
   // It also seems that reading 32 bit from an address aligned to 32 bit (4 bytes) is also safe (why??)

   load_two_float32_from_stack reg, [stack_memory+12];
   // This should not be right (why?)

My questions are in the code but the point is:

I didn't really understand why a stack allocation should be aligned to an address and why that should matter if I can read 1 byte from a totally unaligned address and read a float32 from an address which is just a multiple of 4

like image 367
user129506 Avatar asked Dec 14 '25 01:12

user129506


1 Answers

That's an interesting question. Let me try an explain on your code:

.local .align 16 .byte stack_memory[200]; 

Q: This should mean the stack memory starts at an address aligned to 16 (why would this be necessary?)

A: The answer is because of optimizations and data coherency. Having the buffer aligned to 16B ensures that the buffer is spread in a minimum number of cache lines. If a cache line were to be 16B (they are usually 64B in current archs) having the buffer aligned would ensure that first 16B are stored in the first line, next 16B in the next line and so on. Now if you want to do a SIMD operation on 16B you need only to access a single cache line. Without the alignment you would have accessed 2 cache lines most probably and what would have happened if while you are accessing the first line, some other compute unit modifies the second line ?

load_byte_from_stack reg, [stack_memory+1];

Q: It seems reading 1 byte is always safe (why?)

A: Because one byte cannot fall into 2 distinct cache lines.

load_float32_from_stack reg, [stack_memory+8]; 

Q: It also seems that reading 32 bit from an address aligned to 32 bit (4 bytes) is also safe (why??)

A: Same reason here. Because of alignment, you are certain that your 4B value does not fall into 2 consecutive cache lines.

load_two_float32_from_stack reg, [stack_memory+12];

Q: This should not be right (why?)

A: Yes this is problematic, mostly for architectures with a relaxed memory model. If the cache line is only 16B then aligning to 16B and reading 2 x 4B from offset 12, would read the first 4B from line 1 and next 4B from line 2. That may cause some coherency issues if the programmer does not think that the second 4B might be modified by someone else before reading it (because read instruction cannot block 2 cache lines).

Hope this helps.

like image 197
VAndrei Avatar answered Dec 15 '25 14:12

VAndrei



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!