Why is attribute noinline ignored by gcc-15.1.0 in this example?

Question

Looking at this benchmark about a custom std::function implementation: https://github.com/PacktPublishing/Hands-On-Design-Patterns-with-CPP-Second-Edition/blob/main/Chapter06/09_function.C

I tried to replicate the example and I noticed that despite declaring this simple function like this: __attribute__((noinline)) auto function_no_inline(int a, int b, int c, int d) -> int { return a + b + c + d; }, the time it took was the same as the inline function, while it was much more if function was actually defined in a different compilation unit. It seems that the attribute was ignored for some reason. Why? Arguments are obtained from rand().

Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_invoke_function                 1.35 ns         1.35 ns    504544141
BM_invoke_function_no_inline      0.271 ns        0.271 ns   2584830443
BM_invoke_function_inline         0.270 ns        0.270 ns   2580073503
BM_invoke_std_function             2.21 ns         2.17 ns    324669753

This is my code. It links against the google-benchmark library


    #include <benchmark/benchmark.h>
    
    #include <functional>
    
    auto function(int a, int b, int c, int d) -> int;
    
    __attribute__((noinline)) auto function_no_inline(int a, int b, int c, int d) -> int { return a + b + c + d; }
    
    inline auto function_inline(int a, int b, int c, int d) { return a + b + c + d; }
    
    template <typename Callable>
    auto invoke(int a, int b, int c, int d, const Callable& callable)
    {
      return callable(a, b, c, d);
    }
    
    // Benchmarks
    void BM_invoke_function(benchmark::State& state)
    {
      int a{rand()};
      int b{rand()};
      int c{rand()};
      int d{rand()};
    
      for (auto _ : state)
      {
        benchmark::DoNotOptimize(invoke(a, b, c, d, function));
        benchmark::ClobberMemory();
      }
    }
    
    void BM_invoke_function_no_inline(benchmark::State& state)
    {
      int a{rand()};
      int b{rand()};
      int c{rand()};
      int d{rand()};
    
      for (auto _ : state)
      {
        benchmark::DoNotOptimize(invoke(a, b, c, d, function_no_inline));
        benchmark::ClobberMemory();
      }
    }
    
    void BM_invoke_function_inline(benchmark::State& state)
    {
      int a{rand()};
      int b{rand()};
      int c{rand()};
      int d{rand()};
    
      for (auto _ : state)
      {
        benchmark::DoNotOptimize(invoke(a, b, c, d, function_inline));
        benchmark::ClobberMemory();
      }
    }
    
    void BM_invoke_std_function(benchmark::State& state)
    {
      int a{rand()};
      int b{rand()};
      int c{rand()};
      int d{rand()};
    
      std::function<int(int, int, int, int)> std_function{function};
    
      for (auto _ : state)
      {
        benchmark::DoNotOptimize(invoke(a, b, c, d, std_function));
        benchmark::ClobberMemory();
      }
    }
    
    BENCHMARK(BM_invoke_function);
    BENCHMARK(BM_invoke_function_no_inline);
    BENCHMARK(BM_invoke_function_inline);
    BENCHMARK(BM_invoke_std_function);
    
    BENCHMARK_MAIN();

OLEGSHA · Accepted Answer

I popped your example into Compiler Explorer (link) and I see that function_inline is inlined, but function_no_inline is indeed not:

BM_invoke_function_inline(benchmark::State&):
        push    r15
        push    r14
[...]
        lea     edx, [r14+r15]
        add     edx, ebp
        add     edx, DWORD PTR [rsp+12]

BM_invoke_function_no_inline(benchmark::State&):
        push    r15
        push    r14
[...]
        call    function_no_inline(int, int, int, int)

I'm not sure if I guessed your compilation setup correctly (e.g. -std=c++23 -O3), but either I can't reproduce your results, or the explanation does not involve noinline.

That said, noinline is kind of outdated: it prevents inlining, but it does not prevent several other kinds of optimizations that could be affecting your situation (though apparently not if we trust my Compiler Explorer results.) The more bulletproof method is to use noipa to explicitly ask GCC to treat the function as a standalone unit. It includes noinline and any other dark magic.

From GCC function attribute docs:

noinline

This function attribute prevents a function from being considered for inlining. It also disables some other interprocedural optimizations; it’s preferable to use the more comprehensive noipa attribute instead if that is your goal.

Even if a function is declared with the noinline attribute, there are optimizations other than inlining that can cause calls to be optimized away if it does not have side effects, although the function call is live. To keep such calls from being optimized away, put
asm ("");

noipa

Disable interprocedural optimizations between the function with this attribute and its callers, as if the body of the function is not available when optimizing callers and the callers are unavailable when optimizing the body. [...]

Why is attribute noinline ignored by gcc-15.1.0 in this example?

Tags:

c++

gcc

microbenchmark

inline

google-benchmark

luczzz

1 Answers

OLEGSHA

Recent Activity

Donate For Us

Why is attribute noinline ignored by gcc-15.1.0 in this example?

Tags:

c++

gcc

microbenchmark

inline

google-benchmark

luczzz

1 Answers

OLEGSHA

Related questions

Recent Activity

Donate For Us