I was writing some assembly code for some project of mine and I saw something interesting. the size of binary when linked is so big. so I tested and tested and even with smallest possible lines of code, output Elf binary is so large. for example:
.section .text
.global _start
_start:
    movl $1,%eax
    movl $0,%ebx
    int $0x80
after assembling and linking above code the result binary is more than 4kb!
the funny thing is, most of the binary is filled with zeroes.
I tried so many things to find out what is the cause to no success. 
can someone please explain to me what is the problem here?
I simply assemble and link the file:
as -o <OBJ_NAME> <SOURCE NAME>
ld -o <ELF_NAME> <OBJ_NAME>
recommending any form of resource for further reading will be nice.
as you may guessed, I use 64bit GNU/Linux
thanks.
This has to do with alignment. See readelf -eW <ELF_NAME>. The interesting bit is
Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000401000 001000 00000c 00  AX  0   0  1
Note the Off column. This is the offset in the file, and the .text section starts with 0x1000, which is 4K.
Same picture if you look at the program headers. The space that is filled with zeroes is between the end of the ELF header and 0x1000.
Why is this?
First, because the ELF standard dictates that
Loadable process segments must have congruent values for p_vaddr and p_offset, modulo the page size.
(see man elf). The page size on your system (mine as well) is 4K. This is the value that you see in p_align.
Second, the virtual address the linker has assigned to the start of the "text" segment — same as for the .text section here, because that's all that segment contains here — is 0x0000000000401000. Therefore the hexadecimal representation of the "text" segment's offset in the file has to end with 000. But 0 is already taken by the readonly segment containing the ELF header (the very beginning of the file). The second choice is 0x1000.
Why did the linker choose 0x401000 as the virtual address for the text section? I don't know. I think, if you tweak the linker script a little, you'll be able to have a smaller resluting executable.
As Peter and that other guy have pointed out, page-size alignment can be disabled using the -n linker option:
'-n'
'--nmagic'
    Turn off page alignment of sections, and disable linking against
    shared libraries[…]
That way I get
Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 1] .text             PROGBITS        0000000000400078 000078 00000c 00  AX  0   0  1
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000078 0x0000000000400078 0x0000000000400078 0x00000c 0x00000c R E 0x1
and the size of the executable is down to 664 bytes (344 after stripping).
With GNU ld, you can use linker scripts to fine-control the layout of linker output files. ld.bfd (usually also known as just ld) interprets a default linker script if the user doesn't specify one. It can be obtained with ld --verbose. You can then edit it and supply your version instead of the default with -T <your-script>.
I edited out the first occurance of
. = ALIGN(CONSTANT (MAXPAGESIZE));
(before .text) and got 720 (400 when stripped) bytes. This is different from the result of using the -n option. You still get 2 loadable segmemts, and their p_align is still 0x1000.
There are efficiency implications for having p_align < MAX_PAGE_SIZE that I don't fully understand. (Pages won't be loaded as fast due to harder address computation? I think there should be a better explanation.) Feel free to edit the answer, if you know more about this or where it's explained.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With