In my Linux program, I need a function that takes an address addr and checks whether a callq instruction placed at addr is calling an specific function func loaded from a shared library. I mean, I need to check whether I have something like callq func@PLT at addr.
So, on Linux, how to reach the real address of a function func from a callq func@PLT instruction?
You can only find out about that at runtime, after the dynamic linker resolves the actual load address.
Warning: What follows is slightly deeper magic ...
To illustrate what's happening use a debugger:
#include <stdio.h>
int main(int argc, char **argv) { printf("Hello, World!\n"); return 0; }
Compile it (gcc -O8 ...). objdump -d on the binary shows (the optimization of printf() being substituted with puts() for a plain string not withstanding ...):
Disassembly of section .init: [ ... ] Disassembly of section .plt: 0000000000400408 <__libc_start_main@plt-0x10>: 400408: ff 35 a2 04 10 00 pushq 1049762(%rip) # 5008b0 <_GLOBAL_OFFSET_TABLE_+0x8>> 40040e: ff 25 a4 04 10 00 jmpq *1049764(%rip) # 5008b8 <_GLOBAL_OFFSET_TABLE_+0x10> [ ... ] 0000000000400428 <puts@plt>: 400428: ff 25 9a 04 10 00 jmpq *1049754(%rip) # 5008c8 <_GLOBAL_OFFSET_TABLE_+0x20> 40042e: 68 01 00 00 00 pushq $0x1 400433: e9 d0 ff ff ff jmpq 400408 <_init+0x18> [ ... ] 0000000000400500 <main>: 400500: 48 83 ec 08 sub $0x8,%rsp 400504: bf 0c 06 40 00 mov $0x40060c,%edi 400509: e8 1a ff ff ff callq 400428 <puts@plt> 40050e: 31 c0 xor %eax,%eax 400510: 48 83 c4 08 add $0x8,%rsp 400514: c3 retq
Now load it into gdb. Then:
$ gdb ./tcc GNU gdb Red Hat Linux (6.3.0.0-0.30.1rh) [ ... ] (gdb) x/3i 0x400428 0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32> 0x40042e: pushq $0x1 0x400433: jmpq 0x400408 (gdb) x/gx 0x5008c8 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>: 0x000000000040042e
Notice this value points back to the instruction directly following the first jmpq; this means the puts@plt slot, on first invocation, will simply "fall through" to:
(gdb) x/3i 0x400408 0x400408: pushq 1049762(%rip) # 0x5008b0 <_GLOBAL_OFFSET_TABLE_+8> 0x40040e: jmpq *1049764(%rip) # 0x5008b8 <_GLOBAL_OFFSET_TABLE_+16> 0x400414: nop (gdb) x/gx 0x5008b0 0x5008b0 <_GLOBAL_OFFSET_TABLE_+8>: 0x0000000000000000 (gdb) x/gx 0x5008b8 0x5008b8 <_GLOBAL_OFFSET_TABLE_+16>: 0x0000000000000000
The function address and argument aren't initialized yet.
This is the state just after program load, but before executing. Now start executing it:
(gdb) break main Breakpoint 1 at 0x400500 (gdb) run Starting program: tcc (no debugging symbols found) (no debugging symbols found) Breakpoint 1, 0x0000000000400500 in main () (gdb) x/i 0x400428 0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32> (gdb) x/gx 0x5008c8 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32>: 0x000000000040042e
So this hasn't changed yet - but the targets (the GOT contents for the libc initialization) are different now:
(gdb) x/gx 0x5008b0 0x5008b0 <_GLOBAL_OFFSET_TABLE_+8>: 0x0000002a9566b9a8 (gdb) x/gx 0x5008b8 0x5008b8 <_GLOBAL_OFFSET_TABLE_+16>: 0x0000002a955609f0 (gdb) disas 0x0000002a955609f0 Dump of assembler code for function _dl_runtime_resolve: 0x0000002a955609f0 <_dl_runtime_resolve+0>: sub $0x38,%rsp [ ... ]
I.e. at program load time, the dynamic linker will resolve the "init" parts first. It substitutes the GOT references with pointers that redirect into the dynamic linking code.  
Therefore, when first calling an external-to-the-binary function through the .plt reference, it'll jump into the linker again. Let it do that, then inspect the program after that - the state has changed again:
(gdb) break *0x0000000000400514 Breakpoint 2 at 0x400514 (gdb) continue Continuing. Hello, World! Breakpoint 2, 0x0000000000400514 in main () (gdb) x/i 0x400428 0x400428: jmpq *1049754(%rip) # 0x5008c8 <_GLOBAL_OFFSET_TABLE_+32> (gdb) x/gx 0x5008c8 0x5008c8 : 0x0000002a956c8870 (gdb) disas 0x0000002a956c8870 Dump of assembler code for function puts: 0x0000002a956c8870 <puts+0>: mov %rbx,0xffffffffffffffe0(%rsp) [ ... ]
So there's your redirect right into libc now - the PLT reference to puts() finally got resolved.
The instructions to the linker where to insert the actual function load addresses (that we've seen it do for _dl_runtime_resolve comes from special sections in the ELF binary:
$ readelf -a tcc
[ ... ]
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
[ ... ]
  INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
[ ... ]
Dynamic section at offset 0x700 contains 21 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
[ ... ]
Relocation section '.rela.plt' at offset 0x3c0 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
0000005008c0  000100000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
0000005008c8  000200000007 R_X86_64_JUMP_SLO 0000000000000000 puts + 0
There's more to ELF than just the above, but these three pieces tell the kernel's binary format handler "this ELF binary has an interpreter" (which is the dynamic linker) that needs to be loaded / initialized first, that it requires libc.so.6, and that offsets 0x5008c0 and 0x5008c8 in the program's writeable data section must be substituted by the load addresses for __libc_start_main and puts, respectively, when the step of dynamic linking is actually performed.
How exactly that happens, from ELF's point of view, is up to the details of the interpreter (aka, the dynamic linker implementation).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With