Consider these three classes:
struct Foo
{
// causes default ctor to be deleted
constexpr explicit Foo(int i) noexcept : _i(i) {}
private:
int _i;
};
// same as Foo but default ctor is brought back and explicitly defaulted
struct Bar
{
constexpr Bar() noexcept = default;
constexpr explicit Bar(int i) noexcept : _i(i) {}
private:
int _i;
};
// same as Bar but member variable uses brace-or-equals initializer (braces in this case)
struct Baz
{
constexpr Baz() noexcept = default;
constexpr explicit Baz(int i) noexcept : _i(i) {}
private:
int _i{};
};
The following static_assert
s evaluate to true (C++20):
static_assert(not std::is_trivially_default_constructible_v<Foo>);
static_assert(std::is_trivially_default_constructible_v<Bar>);
static_assert(not std::is_trivially_default_constructible_v<Baz>);
meaning that only Bar is considered to be trivially default constructible.
I understand how Foo
and Baz
don't meet the condition as far the standard's definition goes, but what I don't understand is why does this mean that certain algorithms can optimize operations on Bar
in ways that they can't do so on Foo
or Baz
.
Example of runtime test showcasing the benefits of being trivially default constructible: https://quick-bench.com/q/t1W4ItmCoJ60U88_ED9s_7I9Cl0
The test populates a vector with 1000 randomly generated objects and measures the runtime of doing so. Ran with int
, Foo
, Bar
, Baz
. My guess is the vector reallocations and copying/moving of the objects is where the difference in performance manifests itself.
What is it about being trivially default constructible that enables optimizations?
Why was the compiler (or the std::vector implementation) unable to apply the same optimization on Foo
and Baz
?
This is a missed optimization for gcc.
Basically, the question is: when vector
has to reallocate, how do you transfer the elements from the old storage to the new storage?
gcc's implementation currently tries to do this (I removed some irrelevant blocks of code for brevity):
// This class may be specialized for specific types.
// Also known as is_trivially_relocatable.
template<typename _Tp, typename = void>
struct __is_bitwise_relocatable
: is_trivial<_Tp> { };
template <typename _InputIterator, typename _ForwardIterator,
typename _Allocator>
_GLIBCXX20_CONSTEXPR
inline _ForwardIterator
__relocate_a_1(_InputIterator __first, _InputIterator __last,
_ForwardIterator __result, _Allocator& __alloc)
noexcept(noexcept(std::__relocate_object_a(std::addressof(*__result),
std::addressof(*__first),
__alloc)))
{
_ForwardIterator __cur = __result;
for (; __first != __last; ++__first, (void)++__cur)
std::__relocate_object_a(std::__addressof(*__cur),
std::__addressof(*__first), __alloc);
return __cur;
}
template <typename _Tp, typename _Up>
_GLIBCXX20_CONSTEXPR
inline __enable_if_t<std::__is_bitwise_relocatable<_Tp>::value, _Tp*>
__relocate_a_1(_Tp* __first, _Tp* __last,
_Tp* __result,
[[__maybe_unused__]] allocator<_Up>& __alloc) noexcept
{
ptrdiff_t __count = __last - __first;
if (__count > 0)
{
__builtin_memmove(__result, __first, __count * sizeof(_Tp));
}
return __result + __count;
}
The first overload here does a member-wise copy, the second overload does a single memmove
- but only if the type satisfies __is_bitwise_relocatable<_Tp>
, which as you can see defaults to std::is_trivial
. std::is_trivial
requires a trivial default constructor, which isn't actually relevant for this particular optimization (see #68350, but that's what's causing the code path to do the slow element-wise copy instead of the single memmove.
You can verify this is the case by specializing __is_bitwise_relocatable<Foo>
and seeing that the performance now lines up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With