Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Initializing an array of trivially_copyable but not default_constructible objects from bytes. Confusion in [intro.object]

We are initializing (large) arrays of trivially_copiable objects from secondary storage, and questions such as this or this leaves us with little confidence in our implemented approach.

Below is a minimal example to try to illustrate the "worrying" parts in the code. Please also find it on Godbolt.

Example

Let's have a trivially_copyable but not default_constructible user type:

struct Foo
{
    Foo(double a, double b) :
        alpha{a}, 
        beta{b}
    {}

    double alpha;
    double beta;
};

Trusting cppreference:

Objects of trivially-copyable types that are not potentially-overlapping subobjects are the only C++ objects that may be safely copied with std::memcpy or serialized to/from binary files with std::ofstream::write()/std::ifstream::read().

Now, we want to read a binary file into an dynamic array of Foo. Since Foo is not default constructible, we cannot simply:

std::unique_ptr<Foo[]> invalid{new Foo[dynamicSize]}; // Error, no default ctor

Alternative (A)

Using uninitialized unsigned char array as storage.

std::unique_ptr<unsigned char[]> storage{
    new unsigned char[dynamicSize * sizeof(Foo)] };

input.read(reinterpret_cast<char *>(storage.get()), dynamicSize * sizeof(Foo));

std::cout << reinterpret_cast<Foo *>(storage.get())[index].alpha << "\n";

Is there an UB because object of actual type Foo are never explicitly created in storage?

Alternative (B)

The storage is explicitly typed as an array of Foo.

std::unique_ptr<Foo[]> storage{
    static_cast<Foo *>(::operator new[](dynamicSize * sizeof(Foo))) };

input.read(reinterpret_cast<char *>(storage.get()), dynamicSize * sizeof(Foo));

std::cout << storage[index].alpha << "\n";

This alternative was inspired by this post. Yet, is it better defined? It seems there are still no explicit creation of object of type Foo.

It is notably getting rid of the reinterpret_cast when accessing the Foo data member (this cast might have violated the Type Aliasing rule).

Overall Questions

  • Are any of these alternatives defined by the standard? Are they actually different?

    • If not, is there a correct way to implement this (without first initializing all Foo instances to values that will be discarded immediately after)
  • Is there any difference in undefined behaviours between versions of the C++ standard? (In particular, please see this comment with regard to C++20)

like image 752
Ad N Avatar asked Dec 31 '25 17:12

Ad N


1 Answers

What you're trying to do ultimately is create an array of some type T by memcpying bytes from elsewhere without default constructing the Ts in the array first.

Pre-C++20 cannot do this without provoking UB at some point.

The problem ultimately comes down to [intro.object]/1, which defines the ways objects get created:

An object is created by a definition, by a new-expression, when implicitly changing the active member of a union, or when a temporary object is created ([conv.rval], [class.temporary]).

If you have a pointer of type T*, but no T object has been created in that address, you can't just pretend that the pointer points to an actual T. You have to cause that T to come into being, and that requires doing one of the above operations. And the only available one for your purposes is the new-expression, which requires that the T is default constructible.

If you want to memcpy into such objects, they must exist first. So you have to create them. And for arrays of such objects, that means they need to be default constructible.

So if it is at all possible, you need a (likely defaulted) default constructor.


In C++20, certain operations can implicitly create objects (provoking "implicit object creation" or IOC). IOC only works on implicit lifetime types, which for classes:

A class S is an implicit-lifetime class if it is an aggregate or has at least one trivial eligible constructor and a trivial, non-deleted destructor.

Your class qualifies, as it has a trivial copy constructor (which is "eligible") and a trivial destructor.

If you create an array of byte-wise types (unsigned char, std::byte, or char), this is said to "implicitly create objects" in that storage. This property also applies to the memory returned by malloc and operator new. This means that if you do certain kinds of undefined behavior to pointers to that storage, the system will automatically create objects (at the point where the array was created) that would make that behavior well-defined.

So if you allocate such storage, cast a pointer to it to a T*, and then start using it as though it pointed to a T, the system will automatically create Ts in that storage, so long as it was appropriately aligned.

Therefore, your alternative A works just fine:

When you apply [index] to your casted pointer, C++ will retroactively create an array of Foo in that storage. That is, because you used the memory like an array of Foo exists there, C++20 will make an array of Foo exist there, exactly as if you had created it back at the new unsigned char statement.

However, alternative B will not work as is. You did not use new[] Foo to create the array, so you cannot use delete[] Foo to delete it. You can still use unique_ptr, but you'll have to create a deleter that explicitly calls operator delete on the pointer:

struct mem_delete
{
  template<typename T>
  void operator(T *ptr)
  {
    ::operator delete[](ptr);
  }
};

std::unique_ptr<Foo[], mem_delete> storage{
    static_cast<Foo *>(::operator new[](dynamicSize * sizeof(Foo))) };

input.read(reinterpret_cast<char *>(storage.get()), dynamicSize * sizeof(Foo));

std::cout << storage[index].alpha << "\n";

Again, storage[index] creates an array of T as if it were created at the time the memory was allocated.

like image 96
Nicol Bolas Avatar answered Jan 02 '26 06:01

Nicol Bolas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!