When I compile following code with gcc 6 -O3 -std=c++14, I get nice and empty main:
Dump of assembler code for function main():
   0x00000000004003e0 <+0>:     xor    %eax,%eax
   0x00000000004003e2 <+2>:     retq 
But uncommenting last line in main "breaks" optimization:
Dump of assembler code for function main():
   0x00000000004005f0 <+0>:     sub    $0x78,%rsp
   0x00000000004005f4 <+4>:     lea    0x40(%rsp),%rdi
   0x00000000004005f9 <+9>:     movq   $0x400838,0x10(%rsp)
   0x0000000000400602 <+18>:    movb   $0x0,0x18(%rsp)
   0x0000000000400607 <+23>:    mov    %fs:0x28,%rax
   0x0000000000400610 <+32>:    mov    %rax,0x68(%rsp)
   0x0000000000400615 <+37>:    xor    %eax,%eax
   0x0000000000400617 <+39>:    movl   $0x0,(%rsp)
   0x000000000040061e <+46>:    movq   $0x400838,0x30(%rsp)
   0x0000000000400627 <+55>:    movb   $0x0,0x38(%rsp)
   0x000000000040062c <+60>:    movl   $0x0,0x20(%rsp)
   0x0000000000400634 <+68>:    movq   $0x400838,0x50(%rsp)
   0x000000000040063d <+77>:    movb   $0x0,0x58(%rsp)
   0x0000000000400642 <+82>:    movl   $0x0,0x40(%rsp)
   0x000000000040064a <+90>:    callq  0x400790 <ErasedObject::~ErasedObject()>
   0x000000000040064f <+95>:    lea    0x20(%rsp),%rdi
   0x0000000000400654 <+100>:   callq  0x400790 <ErasedObject::~ErasedObject()>
   0x0000000000400659 <+105>:   mov    %rsp,%rdi
   0x000000000040065c <+108>:   callq  0x400790 <ErasedObject::~ErasedObject()>
   0x0000000000400661 <+113>:   mov    0x68(%rsp),%rdx
   0x0000000000400666 <+118>:   xor    %fs:0x28,%rdx
   0x000000000040066f <+127>:   jne    0x400678 <main()+136>
   0x0000000000400671 <+129>:   xor    %eax,%eax
   0x0000000000400673 <+131>:   add    $0x78,%rsp
   0x0000000000400677 <+135>:   retq   
   0x0000000000400678 <+136>:   callq  0x4005c0 <__stack_chk_fail@plt>
Code
#include <type_traits>
#include <new>
namespace
{
struct ErasedTypeVTable
{
   using destructor_t = void (*)(void *obj);
   destructor_t dtor;
};
template <typename T>
void dtor(void *obj)
{
   return static_cast<T *>(obj)->~T();
}
template <typename T>
static const ErasedTypeVTable erasedTypeVTable = {
   &dtor<T>
};
}
struct ErasedObject
{
   std::aligned_storage<sizeof(void *)>::type storage;
   const ErasedTypeVTable& vtbl;
   bool flag = false;
   template <typename T, typename S = typename std::decay<T>::type>
   ErasedObject(T&& obj)
   : vtbl(erasedTypeVTable<S>)
   {
      static_assert(sizeof(T) <= sizeof(storage) && alignof(T) <= alignof(decltype(storage)), "");
      new (object()) S(std::forward<T>(obj));
   }
   ErasedObject(ErasedObject&& other) = default;
   ~ErasedObject()
   {
      if (flag)
      {
         ::operator delete(object());
      }
      else
      {
         vtbl.dtor(object());
      }
   }
   void *object()
   {
      return reinterpret_cast<char *>(&storage);
   }
};
struct myType
{
   int a;
};
int main()
{
   ErasedObject c1(myType{});
   ErasedObject c2(myType{});
   //ErasedObject c3(myType{});
}
clang can optimize-out both versions.
Any ideas what's going on? Am I hitting some optimization limit? If so, is it configurable?
GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O , this option increases both compilation time and the performance of the generated code.
The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code.
Use the command-line option -O0 (-[capital o][zero]) to disable optimization, and -S to get assembly file. Look here to see more gcc command-line options. Show activity on this post.
I ran g++ with -fdump-ipa-inline to get more information about why functions are or are not inlined.
For the testcase with main() function and three objects created I got:
  (...)
  150 Deciding on inlining of small functions.  Starting with size 35.
  151 Enqueueing calls in void {anonymous}::dtor(void*) [with T = myType]/40.
  152 Enqueueing calls in int main()/35.
  153   not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
  154   not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
  155   not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
  (...)
This error code is set in gcc/gcc/ipa-inline.c:
  else if (!e->maybe_hot_p ()
       && (growth >= MAX_INLINE_INSNS_SINGLE
       || growth_likely_positive (callee, growth)))
{
      e->inline_failed = CIF_UNLIKELY_CALL;
      want_inline = false;
}
Then I discovered, that the smallest change to make g++ inline these functions is to add a declaration:
int main() __attribute__((hot));
I wasn't able to find in code why int main() isn't considered hot, but probably this should be left for another question.
More interesting is the the second part of the conditional I pasted above. The intent was to not inline when the code will grow and you produced an example when the code shrinks after complete inlining.
I think this deserves to be reported on GCC's bugzilla, but I'm not sure if you can call it a bug - estimation of inline impact is a heuristic and as such it is expected to work correctly in most cases, not all of them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With