Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting LLVM IR to Java Bytecode

I am beginner and want to build translator that can convert LLVM bitcode to Java Bytecode.

Can somebody please tell me in brief or list some major steps how to go through it.

like image 973
Ratnesh Avatar asked Oct 29 '25 07:10

Ratnesh


1 Answers

In our company (Altimesh), we did the same thing for CIL. For Java Bytecode, the task is likely very similar.

I can tell you it's quite a long task.

First thing : LLVM libraries are written in C++

That means you either have to learn c++, and a way to generate java bytecode from C++, or export the symbols you need from LLVM libraries to JNI. I strongly recommend the second option, as you'll get a pure Java implementation (and you'll soon figure out that you don't need that many symbols from LLVM API).

Once you figured that out, you need to:

  1. Parse modules from files

here is a simple example (using llvm 3.9 API, which is quite old now):

llvm::Module* llvm__Module_buildFromFile (llvm::LLVMContext* context, const char* filename)
    {
        llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> buf = llvm::MemoryBuffer::getFile(filename);
        llvm::SMDiagnostic diag;
        return llvm::parseIR(buf->get()->getMemBufferRef(), diag, *context).release();
    }
  1. Parse debug infos

    void llvm__DebugInfoFinder__processModule(llvm::DebugInfoFinder* self, llvm::Module* M) { self->processModule(*M); }

debug info, or metadata, are quite a pain with llvm, as they change very frequently (compared to instructions). So you either have to stick to an LLVM's version (probably a bad choice), or update your code as soon as a new LLVM release gets out.

Once you're there, most of the pain is behind you, and you enter the world of fun.

I strongly recommend to start with something very very simple, such as a simple addition program.

Then always keep two windows opened, godbolt showing you input llvm you need to parse, and a java window showing you the target (here is an example for MSIL).

Once you're able to transpile your first program (hurrraah, I can add two integers :) ), you will soon want to transpile more stuff, and soon you will face two insanities:

  • getelementptr. This is how arrays, memory, structures... is accessed in LLVM. This is a pretty magic instruction.

  • phi. Crucial instruction in LLVM system, as it allows Single Static Assignment, which is fairly important for the backend (register allocator and co). I don't know in Java, but this was obviously not available in MSIL.

Once all of that is done, you enter the endless world of pain of special cases, weird C constructs you didn't know about, gcc extensions and so on...

Anyway good luck!

like image 87
Regis Portalez Avatar answered Oct 31 '25 08:10

Regis Portalez



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!