In many debates about the inline keyword in function declarations, someone will point that it can actually make your program slower in some cases – mostly due to code explosion, if I am correct. I have never met such an example in practice myself. What is an actual code where the use of inline can be expected to be detrimental to the performance?
Exactly 10 years and one day ago I did this commit in OpenBSD:
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/amd64/include/intr.h.diff?r1=1.3;r2=1.4
The commit message was:
deinline splraise, spllower and setsoftint. Makes the kernel smaller and faster. deraadt@ ok
As far as I remember the kernel binary shrunk by more than 100kB and not a single test case could be produced that became slower and several macro benchmarks (like compiling the kernel) were measurably faster (5-10% if I recall correctly, but don't quote me on that).
Around the same time I went on a quest to actually measure inline functions in the OpenBSD kernel. I found a few that had minimal performance gains, but the majority had 0 measurable impact and several were making things much slower and were killed. At least one more uninlining had a huge impact and that one was the internal malloc macros (where the idea was to inline malloc if it had a size known at compile time) and packet buffer allocators that shrunk the kernel by 150kB and had a significant performance improvement.
One could speculate, although I have no proof, that this is because the kernel is large and we're struggling to stay inside the cache when executing system calls and every little bit helps. So what actually helped in those cases was just the shrinking of the binary, not the number of instructions executed.
Imagine a function that have no parameters, but intensive computation with a consistent number of intermediate values or register usage. Then Inline that function in code having a consistent number of intermediate values or register usage too.
Having no parameters make the call procedure more lightweight because no stack operations, that are time consuming, are required.
When inlined the compiler have to save many registers, and spill other to be used with the new function, reproducing the process of registers and data backup required for a function call possibly in worst way.
If the backup operations are more expansive, in terms of time and machine cycles, compared with the mechanism of function call, especially if the function is extensively called, then you have a detrimental effect.
This seems to be the case of some specific functions largely used in an OS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With