Is it at all possible to temporarily replace the call-stack, as in the RSP-register, while running an x64 application on windows? I thought that I could achieve this while using a frame-pointer register (R14):
push r14
sub rsp,28h
mov r14,rsp
mov rsp,qword ptr [rcx]
Here, "rcx" is an argument of a struct "Stack", that contains a "void*" to the memory I want to use. However, this seems to cause issues. Immediately after overwriting RSP, while debugging (in VS using the disassembler-window), the entire call-stack in the callstack-window goes missing, minus the address of the current function:

This happens specifically when setting RSP to my own stacks address. This address is 16-byte aligned. The stack is setup via VirtualAlloc with 3MB of address-space, 4kb of initial commited space and guard-pages on both sides to check for overflow/stack growth. The stack is setup in the correct direction, where "sub" is used to grow. I also did setup the frame-pointer in the unwind information correctly (including the correct epilogue), as the following will work as expected:
push r14
sub rsp,28h
mov r14,rsp // end of prolog
mov rbx,rcx
sub rsp,80h // callstack still intact
Is there anything else I need to do; or is this simply impossible to do? It's not only an issue of the display-window, when executing the call it will crash somewhere further down the line (hard to pinpoint without any working stacktrace).
Background
As for why I need this: I have a IL based visual scripting language, for which I wrote a native compiler-backend. The language itself is used for games and heavily based on "coroutines"/the ability to yield, with most of the code actually utilizing this feature. It To make this as performant as possible, any yielding-function actually get's a smaller side-loaded stack (that is setup similar to what I described above, but with less reserved space), and executes using this stack (similar to what I belive a thread has). On yield, the stack is simply perserved until resume; and when done, it is returned to a pool where it can be reused.
Yielding can also happen on nested method calls, seamlessly, by simply having their return-addresses stored on the stack.
This worked pretty easily when running in an interpreter. Native code posed additional challenges, but most have been solved. Yield-Resume is implemented with a fake function prolog followed by a jmp to the resume-address. However, I still have to use the interpreters stack manually to manage the return-addresses, which is slower and causes issues with the ability to get correct stack-traces.
So my solution for this would be to replace RSP with my own stack-pointer for the duration of the yielding call. Then, I could simply "call" nested yielding functions normally, with their return-addresses now placed on my stack, but with the native instructions. When I resume, I could simply restore rsp to the point it beforehand, and everything should fall into place... except the system seems to not like me replacing RSP, as stated above.
Anyway, this is just to answer the "why" I would need to do this.
Okayy, after wasting a whole day on it, I found the answer. Thanks to everyone who commented. I'll try to break down the steps to fix:
First, I needed to make absolutely sure that there is enough shadow-space available in the new stack for any upcoming function-call, and that it is aligned. That fixes the initial crashes that I had.
Second, as RbMm pointed out, we need to set the threads StackBase/StackLimit-parameters. This can be done with a function:
auto* pX = (NT_TIB*)NtCurrentTeb();
pX->StackBase = state.stack.GetTop(); pX->StackLimit = state.stack.GetBottom();
However, there is a third field that is not part of any structure that needs to be assigned. It can be addresses relative to NT_TIB, and is set by windows-fiber. Not setting this field will cause a PAGE_GUARD violation when the stack should be grown, ever so rarely under real-world scenarios:
// read old stacks allocation-base from here; and assign the new one
[[nodiscard]] inline void*& getTEBAllocationBase(NT_TIB64& teb) noexcept
{
return *(void**)((char*)&teb + 0x1478);
}
Using RtlCreateUserStack, which is a function that you have to define yourself, and include ntdll.lib for; helps with creating the stack. There was some additional issue with my own stack-creation, that made certain calls crash. See correct context switching under x86-64 / Windows for details.
Lastly, you must not, I repeat, must not use a frame-pointer register to point to the old stack! I thought I was clever, but I was wrong. At least on windows, this seems to entirely confuse the unwind-handler; to a point where it will not be able to resolve the exception no matter what the state of the thread-values are.
5.) Then, once you are done you need to manually restore the stack before the epilogue. You also need to do the same thing in the exception-handler, obviously. At least in my case, I need exceptions to actually propagate through this. ... I haven't actually implemented it out, but I HOPE that I should be able to properly restore the stack in the custom Rtl exception-handler that I generated.
Unfortunately, the point 4) means that, yes, you won't have any callstack back to before the call the replaced the stack. This is consistent with using the WinAPI SwitchFiber - only in that place it makes sense, since you are deferring execution similar to a thread. For me, I'd prefer a callstack similar to a coroutine, but as it's only in the VisualStudio window... Well, almost. When I was using the frame-pointer, the StackWalker-library was still able to show a valid stack-trace. Since I want to have at least the full traces in case of errors; I'll have to come up with a custom solution, though I'm confident this is easier to do then try to get the unwind-handler to execute the exception-handler...
Sorry, long post. But in case anyone ever needs to do what I did... that's really hard to figure out on your own. I'm just happy to have made some progress, even though the full implementation is not yet done.
EDIT: As I've found out very painfully, there was one piece missing that needed to be set in order for the stack to function under any circumstance. I've updated part two, or you can read https://stackoverflow.com/a/78317328/13081625 for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With