Windows must do something to parse the PE header, load the executable in memory, and pass command line arguments to main().
Using OllyDbg I have set the debugger to break on main() so I could view the call stack:

It seems as if symbols are missing so we can't get the function name, just its memory address as seen. However we can see the caller of main is kernel32.767262C4, which is the callee of ntdll.77A90FD9. Towards the bottom of the stack we see RETURN to ntdll.77A90FA4 which I assume to be the first function to ever be called to run an executable. It seems like the notable arguments passed to that function are the Windows' Structured Exception Handler address and the entry point of the executable.
So how exactly do these functions end up in loading the program into memory and getting it ready for the entry point to execute? Is what the debugger shows the entire process executed by the OS before main()?
A DLL can optionally specify an entry-point function. If present, the system calls the entry-point function whenever a process or thread loads or unloads the DLL. It can be used to perform simple initialization and cleanup tasks.
A DLL can have a single entry-point function. The system calls this entry-point function at various times, which I'll discuss shortly. These calls are informational and are usually used by a DLL to perform any per-process or per-thread initialization and cleanup.
if you call CreateProcess system internally call ZwCreateThread[Ex] to create first thread in process
when you create thread - you (if you direct call ZwCreateThread) or system initialize the CONTEXT record for new thread - here Eip(i386) or Rip(amd64) the entry point of thread. if you do this - you can specify any address. but when you call say Create[Remote]Thread[Ex] - how i say - the system fill CONTEXT and it set self routine as thread entry point. your original entry point is saved in Eax(i386) or Rcx(amd64) register.
the name of this routine depended from Windows version.
early this was BaseThreadStartThunk or BaseProcessStartThunk (in case from CreateProcess called) from kernel32.dll.
but now system specify RtlUserThreadStart from ntdll.dll . the RtlUserThreadStart usually call BaseThreadInitThunk from kernel32.dll (except native (boot execute) applications, like smss.exe and chkdsk.exe which no have kernel32.dll in self address space at all ). BaseThreadInitThunk already call your original thread entry point, and after (if) it return - RtlExitUserThread called.
the main goal of this common thread startup wrapper - set the top level SEH filter. only because this we can call SetUnhandledExceptionFilter function. if thread start direct from your entry point, without wrapper - the functional of Top level Exception Filter become unavailable.
but whatever the thread entry point - thread in user space - NEVER begin execute from this point !
early when user mode thread begin execute - system insert APC to thread with LdrInitializeThunk as Apc-routine - this is done by copy (save) thread CONTEXT to user stack and then call KiUserApcDispatcher which call LdrInitializeThunk. when LdrInitializeThunk finished - we return to KiUserApcDispatcher which called NtContinue with saved thread CONTEXT - only after this already thread entry point begin executed.
but now system do some optimization in this process - it copy (save) thread CONTEXT to user stack and direct call LdrInitializeThunk. at the end of this function NtContinue called - and thread entry point being executed.
so EVERY thread begin execute in user mode from LdrInitializeThunk. (this function with exactly name exist and called in all windows versions from nt4 to win10)
what is this function do ? for what is this ? you may be listen about DLL_THREAD_ATTACH notification ? when new thread in process begin executed (with exception for special system worked threads, like LdrpWorkCallback)- he walk by loaded DLL list, and call DLLs entry points with DLL_THREAD_ATTACH notification (of course if DLL have entry point and DisableThreadLibraryCalls not called for this DLL). but how this is implemented ? thanks to LdrInitializeThunk which call LdrpInitialize -> LdrpInitializeThread -> LdrpCallInitRoutine (for DLLs EP)
when the first thread in process start - this is special case. need do many extra jobs for process initialization. at this time only two modules loaded in process - EXE and ntdll.dll . LdrInitializeThunk
call LdrpInitializeProcess for this job. if very briefly:
LdrpDoDebuggerBreak - this function look - are debugger
attached to process, and if yes - int 3 called - so debugger
receive exception message - STATUS_BREAKPOINT - most debuggers can
begin UI debugging only begin from this point. however exist
debugger(s) which let as debug process from LdrInitializeThunk -
all my screenshots from this kind debuggerntdll.dll (and may be from kernel32.dll) - code from another
DLLs, any third-party code not executed in process yet.DLL_PROCESS_DETACH
TLS Initializations and TLS callbacks called (if exists)
ZwTestAlert is called - this call check are exist APC in thread
queue, and execute its. this point exist in all version from NT4 to
win 10. this let as for example create process in suspended state
and then insert APC call ( QueueUserAPC ) to it thread
(PROCESS_INFORMATION.hThread) - as result this call will be
executed after process will be fully initialized, all
DLL_PROCESS_DETACH called, but before EXE entry point. in context
of first process thread.
read also Flow of CreateProcess
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With