Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Not understanding Hopper decompiler output

I know some C and a little bit of assembly and wanted to start learning about reverse engineering, so I downloaded the trial of Hopper Disassembler for Mac. I created a super basic C program:

int main() {
    int a = 5;
    return 0;
}

And compiled it with the -g flag (because I saw this before and wasn't sure if it mattered):

gcc -g simple.c

Then I opened the a.out file in Hopper Disassembler and clicked on the Pseudo Code button and it gave me:

int _main() {
    rax = 0x0;
    var_4 = 0x0;
    var_8 = 0x5;
    rsp = rsp + 0x8;
    rbp = stack[2047];
    return 0x0;
}

The only line I sort of understand here is setting a variable to 0x5. I'm unable to comprehend what all these additional lines are for (such as the rsp = rsp + 0x8;), for such a simple program. Would anyone be willing to explain this to me?

Also if anyone knows of good sources/tutorials for an intro into reverse engineering that'd be very helpful as well. Thanks.

like image 251
Austin Avatar asked Oct 30 '25 03:10

Austin


2 Answers

Looks like it is doing a particularly poor job of producing "disassembly pseudocode" (whatever that is -- is it a disassembler or a decompliler? Can't decide)

In this case it looks like it has has elided the stack frame setup (the function prolog), but not the cleanup (function epilog). So you'll get a much better idea of what is going on by using an actual disassembler to look at the actual disassembly code:

$ gcc -c simple.c
$ objdump -d simple.o

simple.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   c7 45 fc 05 00 00 00    movl   $0x5,-0x4(%rbp)
   b:   b8 00 00 00 00          mov    $0x0,%eax
  10:   5d                      pop    %rbp
  11:   c3                      retq   

So what we have here is code to set up a stack frame (address 0-1), the assignment you have (4), setting up the return value (b), tearing down the frame (10) and then returning (11). You might see something different due to using a different version of gcc or a different target.

In the case of your disassembly, the first part has been elided (left out as being an uninteresting housekeeping task) by the disassembler, but the second to last part (which undoes the first part) has not.

like image 85
Chris Dodd Avatar answered Oct 31 '25 20:10

Chris Dodd


What you're looking at is decompiled code. Every decompiler ouptutwill look something close to that because it's not going to try and get variable names because they can be changed so often and usually are.

So it will put them in a 'var_??' with a number attached to the end. Once you learn about reverse engineering and know the language you're programming in very well, you can understand the code. It's no different when you're trying to de-obfuscate PHP, JavaScript code, etc.

If you ever get into reverse engineering malware be prepared because nothing is going to be easy. You're going to have different packers, obfuscators, messed-up code, VM detection routines, etc. So buckle down and get ready for a long road ahead if reverse engineering is your goal.

like image 21
CyberSorcerer Avatar answered Oct 31 '25 21:10

CyberSorcerer