I did not understood clearly the LEAVE function, It is a condensate of those 2 instructions:
MOV ESP, EBP
POP EBP
So MOV ESP, EBP moves ESP downward to the level of EBP (start of the stack).
Then POP EBP, moves the value pointed by ESP and affects it to EBP, and also moves ESP one step downward.
But I really don't see, how those two operations are linked to the fact of leaving a function (which is the purpose of LEAVE).
Can you help me clarify this please ?
A common prologue, sequence of instructions at the start of a routine, in the 32-bit and 16-bit eras was
push ebp
mov ebp, esp
sub esp, <local_var_size>
push <clobbered_reg1>
push <clobbered_reg2>
...
Nothing is casual here, the order of the instruction is important, we end up with
|parN | <-- EBP + 04 + n*4                 par1..parN = Routine parameters
...     ...                                ra = Return address
|par2 | <-- EBP + 0ch                      o ebp = Original (caller) EBP
|par1 | <-- EBP + 08h                      lvar1..lavarM = Local variables
|ra   | <-- EBP + 04h                      creg1..cregK = Clobbered registers
|o ebp| <-- EBP
|lvar1| <-- EBP - 04h
|lvar2| <-- EBP - 08h
...    ...
|lvarM| <-- EBP - m*4
|creg1|
|creg2|
...
|cregK| <-- ESP
Look how all the data is easily accessed with a suitable pointer from ebp (parameters as successive positive offsets greater or equal to 8, local vars as negative offsets lower or equal to 4) and how well this model scales for a greater number of local vars or parameters.
For this reason ebp is called the frame pointer.
The epilogue must undo all of this.
One possible variant is
pop <clobbered_regK>
...
pop <clobbered_reg1>
add esp, <local_var_size>
pop ebp
ret n*4
However this involves repeating <local_var_size> - it is easy to forget to keep both versions in sync.
We can take advantage of the fact that ebp is the value of esp before the allocation of the local vars, thus by restoring that value we effectively deallocate them all.
pop <clobbered_regK>
...
pop <clobbered_reg1>
mov esp, ebp
pop ebp
ret n*4
But the third and second instruction from the end are what the leave instruction does. So:
pop <clobbered_regK>
...
pop <clobbered_reg1>
leave
ret n*4
is the equivalent prologue.
enter is a very slow instruction (https://agner.org/optimize) so compilers never use it, but leave can be used for optimizing code space with only a tiny impact on performance (which may be balanced out by the code-size saving).  GCC uses leave when a pop ebp on its own wouldn't be sufficient, with most -mtune= settings.
On current Intel CPUs (Skylake for example), leave costs 3 total uops, vs. 2 for mov esp, ebp / pop ebp.  In a real test-case accounting for possible differences in stack-sync uops by calling (from a repeat loop) an actual tiny function that sets up EBP as a frame pointer and allocates some stack space, then tears it down, HW performance counters measured the leave function as taking 1 more front-end uop per call than the mov/pop function.  But the leave function ran slightly faster for some unknown reason, even with both aligned by 32.  (@petercordes ran this test.)
But I really don't see, how those two operations are linked to the fact of leaving a function (which is the purpose of LEAVE).
That's not LEAVE's purpose. That the purpose of RET. Leave doesn't actually do anything except modify the stack. In fact you can LEAVE and then set up another stack frame and still remain in the same function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With