I have a simple debugger (using ptrace : http://pastebin.com/D0um3bUi) to count the number of instructions executed for a given input executable program. It uses ptrace single step execution mode to count instructions.
For that when the program 1)'s executable (a.out from gcc main.c) is given as input to my test debuggger it prints around 100k as instructions executed. When I use -static option it gives 10681 instructions.
Now in 2) I create an assembly program and use NASM for compiling and linking and then when this executable is given as test debuggers input it is showing 8 instructions as the count and which is apt.
The number of instructions executed in program 1) is high because of linking the program with system library's at runtime ? used -static and which reduces the count by a factor of 1/10. How can I ensure that the instruction count is only that of the main function in Program 1) and which is how Program 2) is reporting for the debugger?
1)
#include <stdio.h>
int main()
{
    printf("Hello, world!\n");
    return 0;
}    
I use gcc to create the executable.
2)
; 64-bit "Hello World!" in Linux NASM
global _start            ; global entry point export for ld
section .text
_start:
    ; sys_write(stdout, message, length)
    mov    rax, 1        ; sys_write
    mov    rdi, 1        ; stdout
    mov    rsi, message    ; message address
    mov    rdx, length    ; message string length
    syscall
    ; sys_exit(return_code)
    mov    rax, 60        ; sys_exit
    mov    rdi, 0        ; return 0 (success)
    syscall
section .data
    message: db 'Hello, world!',0x0A    ; message and newline
    length:    equ    $-message        ; NASM definition pseudo-                             
I build with:
nasm -f elf64 -o main.o -s main.asm  
ld -o main main.o
The number of instructions executed in program 1) is high because of linking the program with system library's at runtime?
Yep, dynamic linking plus CRT (C runtime) startup files.
used
-staticand which reduces the count by a factor of 1/10.
So that just left the CRT start files, which do stuff before calling main, and after.
How can I ensure that the instruction count is only that of the main function in Program 1)`
Measure an empty main, then subtract that number from future measurements.
Unless your instruction-counters is smarter, and looks at symbols in the executable for the process it's tracing, it won't be able to tell which code came from where.
and which is how Program 2) is reporting for the debugger.
That's because there is no other code in that program. It's not that you somehow helped the debugger ignore some instructions, it's that you made a program without any instructions you didn't put there yourself.
If you want to see what actually happens when you run the gcc output, gdb a.out, b _start, r, and single-step.  Once you get deep in the call tree, you're prob. going to want to use fin to finish execution of the current function, since you don't want to single-step through literally 1 million instructions, or even 10k.
related: How do I determine the number of x86 machine instructions executed in a C program? shows perf stat will count 3 user-space instructions total in a NASM program that does mov eax, 231 / syscall, linked into a static executable.
Peter gave a very good answer, and I'm going to followup with a response that is cringe worthy and might garner some down votes. When linking directly with LD or indirectly with GCC, the default entry point for ELF executables is the label _start.
Your NASM code uses a global label _start so when your program is run the first code in your program will be the instructions of _start. When using GCC your program's typical entry point is the function main. What is hidden from you is that your C program also has a _start label but it is supplied by the C runtime startup objects.
The question now is - is there a way to bypass the C startup files so that the startup code can be avoided? Technically yes, but this is perilous territory that could yield undefined behaviour. If you are adventurous you can actually tell GCC to change the entry point of your program with the -e command line option. Rather than _start we could make our entry point main bypassing the C startup code. Since we are bypassing the C startup code we can also dispense with linking in the C runtime startup code with the -nostartfiles option.
You could use this command line to compile your C program:
gcc test.c -e main -nostartfiles
Unfortunately, there is a bit of a gotchya that has to be fixed in the C code. Normally when using the C runtime startup objects, after the environment is initialized a CALL is made to main. Normally main does a RET instruction which returns back to the C runtime code. At that point the C runtime gracefully exits your program. RET doesn't have anywhere to return when the -nostartfiles option is used, so it will likely segfault. To get around that we can call the C library _exit function to exit our program.  
#include <stdio.h>
int main()
{
    printf("Hello, world!\n");
    _exit(0);  /* We exit application here, never reaching the return */
    return 0;
}   
Unless you omit frame pointers there are a few extra instructions emitted by GCC to setup the stack frame and tear it down, but the overhead is minimal.
The process above doesn't seem to work for static builds (-static option in GCC) with standard glibc C library. This is discussed in this Stackoverflow answer. The dynamic version works because a shared object can register a function that gets called by the dynamic loader to perform initialization. When building statically this is generally done by the C runtime, but we've skipped that initialization. Because of that GLIBC functions like printf can fail. There are replacement C libraries that are standards compliant that can operate without C runtime initialization. One such product is MUSL. 
On Ubuntu 64-bit these commands should build and install the 64-bit version of MUSL:
git clone git://git.musl-libc.org/musl
cd musl
./configure --prefix=/usr/local/musl/x86-64
make
sudo make install
You can then use the MUSL wrapper for GCC to work with the MUSL's C library instead of the default GLIBC library on most Linux distributions. Parameters are just like GCC so you should be able to do:
/usr/local/musl/x86-64/bin/musl-gcc -e main -static -nostartfiles test.c
When running ./a.out generated with GLIBC it would likely segfault. MUSL doesn't need initialization prior to using most of the C library functions, so it should work even with the -static GCC option.
One of the issues with your comparison is that you call the SYS_WRITE system call directly in NASM, in C you are using printf. User EOF correctly commented that you might want to make it a fairer comparison by calling the write function in C instead of printf. write has far less overhead to it. You could amend your code to be:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main()
{
    char *str = "Hello, world\n";
    write (STDOUT_FILENO, str, 13);
    _exit(0);
    return 0;
}
This will have more overhead than NASM's direct SYS_WRITE syscall, but far less than what printf would generate.
I'm going to issue the caveat that such code and trickery would likely not be taken well in a code review except for some fringe cases of software development.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With