I intend to write my own JIT-interpreter as part of a course on VMs. I have a lot of knowledge about high-level languages, compilers and interpreters, but little or no knowledge about x86 assembly (or C for that matter).
Actually I don't know how a JIT works, but here is my take on it: Read in the program in some intermediate language. Compile that to x86 instructions. Ensure that last instruction returns to somewhere sane back in the VM code. Store the instructions some where in memory. Do an unconditional jump to the first instruction. Voila!
So, with that in mind, I have the following small C program:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main() {
    int *m = malloc(sizeof(int));
    *m = 0x90; // NOP instruction code
    asm("jmp *%0"
               : /* outputs:  */ /* none */
               : /* inputs:   */ "d" (m)
               : /* clobbers: */ "eax");
    return 42;
}
Okay, so my intention is for this program to store the NOP instruction somewhere in memory, jump to that location and then probably crash (because I haven't setup any way for the program to return back to main).
Question: Am I on the right path?
Question: Could you show me a modified program that manages to find its way back to somewhere inside main?
Question: Other issues I should beware of?
PS: My goal is to gain understanding, not necessarily do everything the right way.
Thanks for all the feedback. The following code seems to be the place to start and works on my Linux box:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
unsigned char *m;
int main() {
        unsigned int pagesize = getpagesize();
        printf("pagesize: %u\n", pagesize);
        m = malloc(1023+pagesize+1);
        if(m==NULL) return(1);
        printf("%p\n", m);
        m = (unsigned char *)(((long)m + pagesize-1) & ~(pagesize-1));
        printf("%p\n", m);
        if(mprotect(m, 1024, PROT_READ|PROT_EXEC|PROT_WRITE)) {
                printf("mprotect fail...\n");
                return 0;
        }
        m[0] = 0xc9; //leave
        m[1] = 0xc3; //ret
        m[2] = 0x90; //nop
        printf("%p\n", m);
asm("jmp *%0"
                   : /* outputs:  */ /* none */
                   : /* inputs:   */ "d" (m)
                   : /* clobbers: */ "ebx");
        return 21;
}
Question: Am I on the right path?
I would say yes.
Question: Could you show me a modified program that manages to find its way back to somewhere inside main?
I haven't got any code for you, but a better way to get to the generated code and back is to use a pair of call/ret instructions, as they will manage the return address automatically.
Question: Other issues I should beware of?
Yes - as a security measure, many operating systems would prevent you from executing code on the heap without making special arrangements. Those special arrangements typically amount to you having to mark the relevant memory page(s) as executable.
On Linux this is done using mprotect() with PROT_EXEC.
If your generated code follows the proper calling convention, then you can declare a pointer-to-function type and invoke the function this way:
typedef void (*generated_function)(void);
void *func = malloc(1024);
unsigned char *o = (unsigned char *)func;
generated_function *func_exec = (generated_function *)func;
*o++ = 0x90;     // NOP
*o++ = 0xcb;     // RET
func_exec();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With