Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a type being trivially default constructible improve performance in some situations?

Tags:

c++

c++20

Consider these three classes:

struct Foo
{
    // causes default ctor to be deleted
    constexpr explicit Foo(int i) noexcept : _i(i) {} 
private:
    int _i;
};

// same as Foo but default ctor is brought back and explicitly defaulted
struct Bar
{
    constexpr Bar() noexcept = default;
    constexpr explicit Bar(int i) noexcept : _i(i) {}

private:
    int _i;
};

// same as Bar but member variable uses brace-or-equals initializer (braces in this case)
struct Baz
{
    constexpr Baz() noexcept = default;
    constexpr explicit Baz(int i) noexcept : _i(i) {}

private:
    int _i{};
};

The following static_asserts evaluate to true (C++20):

static_assert(not std::is_trivially_default_constructible_v<Foo>);
static_assert(std::is_trivially_default_constructible_v<Bar>);
static_assert(not std::is_trivially_default_constructible_v<Baz>);

meaning that only Bar is considered to be trivially default constructible.

I understand how Foo and Baz don't meet the condition as far the standard's definition goes, but what I don't understand is why does this mean that certain algorithms can optimize operations on Bar in ways that they can't do so on Foo or Baz.

Example of runtime test showcasing the benefits of being trivially default constructible: https://quick-bench.com/q/t1W4ItmCoJ60U88_ED9s_7I9Cl0

The test populates a vector with 1000 randomly generated objects and measures the runtime of doing so. Ran with int, Foo, Bar, Baz. My guess is the vector reallocations and copying/moving of the objects is where the difference in performance manifests itself.

What is it about being trivially default constructible that enables optimizations?

Why was the compiler (or the std::vector implementation) unable to apply the same optimization on Foo and Baz?

like image 454
supernun Avatar asked Sep 03 '25 16:09

supernun


1 Answers

This is a missed optimization for gcc.

Basically, the question is: when vector has to reallocate, how do you transfer the elements from the old storage to the new storage? gcc's implementation currently tries to do this (I removed some irrelevant blocks of code for brevity):

  // This class may be specialized for specific types.
  // Also known as is_trivially_relocatable.
  template<typename _Tp, typename = void>
    struct __is_bitwise_relocatable
    : is_trivial<_Tp> { };

  template <typename _InputIterator, typename _ForwardIterator,
        typename _Allocator>
    _GLIBCXX20_CONSTEXPR
    inline _ForwardIterator
    __relocate_a_1(_InputIterator __first, _InputIterator __last,
           _ForwardIterator __result, _Allocator& __alloc)
    noexcept(noexcept(std::__relocate_object_a(std::addressof(*__result),
                           std::addressof(*__first),
                           __alloc)))
    {
      _ForwardIterator __cur = __result;
      for (; __first != __last; ++__first, (void)++__cur)
    std::__relocate_object_a(std::__addressof(*__cur),
                 std::__addressof(*__first), __alloc);
      return __cur;
    }

  template <typename _Tp, typename _Up>
    _GLIBCXX20_CONSTEXPR
    inline __enable_if_t<std::__is_bitwise_relocatable<_Tp>::value, _Tp*>
    __relocate_a_1(_Tp* __first, _Tp* __last,
           _Tp* __result,
           [[__maybe_unused__]] allocator<_Up>& __alloc) noexcept
    {
      ptrdiff_t __count = __last - __first;
      if (__count > 0)
    {
      __builtin_memmove(__result, __first, __count * sizeof(_Tp));
    }
      return __result + __count;
    }

The first overload here does a member-wise copy, the second overload does a single memmove - but only if the type satisfies __is_bitwise_relocatable<_Tp>, which as you can see defaults to std::is_trivial. std::is_trivial requires a trivial default constructor, which isn't actually relevant for this particular optimization (see #68350, but that's what's causing the code path to do the slow element-wise copy instead of the single memmove.

You can verify this is the case by specializing __is_bitwise_relocatable<Foo> and seeing that the performance now lines up.

like image 68
Barry Avatar answered Sep 05 '25 08:09

Barry