Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it undefined behaviour to const_cast a const view of a non-const socket received byte buffer?

I have code that receives (non-const) UDP bytes via recv() system call. The byte buffer is then passed along to business-level code that reads the bytes by doing something like:

const auto* msg = reinterpret_cast<const BusinessMessage*>(buffer); // BusinessMessage is a packed struct corresponding exactly to the bytes sent by the sender
... // through some framework code so I'm forced to keep the msg as const...
DoBusinessLogic(msg->GetField1(), msg->GetField2(), ...);

Note: This kind of reinterpret_cast is basically everywhere in the code and is battle-tested for more than a decade, even though I know some purists might raise objections.

Now, due to the restrictions of the business-level frameworks I must deal with, I ended up needing to mutate the msg. Let's say Field2 needs to be changed to some other value. Normally, the correct way to do this is to use a mutable member variable. I cannot do this, because of framework constraints (see sample code below). So, I do this by adding a const member function to BusinessMessage, which actually mutates Field2, something like this:

struct BusinessMessage : Field1, Field2, ... // sadly, I cannot change this impl so as to allow mutable member variables.
{
    void NonConstSetField2(int value) const
    {
        const Field2* field = static_cast<const Field2*>(this);
        Field2* mutableField = const_cast<Field2*>(field);
        mutableField->Set(value);
    }
};

I tested this under UB-Sanitizer, and it doesn't complain.

A colleague says this is Undefined Behavior. I'm not sure, he may be wrong, but I don't know definitively if I'm right, either.

Is this UB according to the Standard? Will this be safe with a modern (e.g. >=12) GCC compiler, even if it is technically UB?

like image 237
Anton Avatar asked Oct 16 '25 16:10

Anton


2 Answers

This is tricky to answer. I'm afraid I do not have standard references to back the below statements up, this is only based on my existing knowledge.

The reinterpret_cast is already undefined behavior. Probably. It depends on the exact details of the implementation of the class, and the wording around lifetime starts is not entirely clear, with some papers currently in the committee that try to clear it up. But given the complicated inheritance-based way the class is built, I'm guessing that it does not qualify as trivially copyable, or "plain old data" as older versions of C++ put it.

Also, your char buffer needs to be correctly aligned.

But it works in the real world, and will continue to work, for the simple reason that lots of code like this exists, so compilers are compelled to keep it working (at least with -fno-strict-aliasing or similar switches).

So you have code that is UB under the strict semantics of the abstract machine, but works in the real world. Now you ask whether something else done on top of it is UB in the abstract machine. The question isn't really meaningful: your code is already UB, and UB once invoked affects the program globally (including time travel), so there's nothing that can be said about the rest of the program anymore.

But let's look at the problem in isolation.

It is defined behavior to modify a mutable storage location. It is undefined behavior to modify an immutable storage location.

Let's take the simple case:

void fn1() {
  BusinessMessage msg;
  msg.NonConstSetField2(42);
}

This is clearly defined behavior. Although the this pointer inside the member function is const, and thus you need the const_cast to call Field2::Set, the actual object is mutable.

void fn2() {
  const BusinessMessage msg;
  msg.NonConstSetField2(42);
}

This is clearly undefined behavior. msg is const, you're not allowed to modify it. The compiler can assume it doesn't change across any function calls and thus cache values. It could place it in read-only memory so that modifications trigger faults. You just can't do it.

But your case is this:

void fn3() {
  alignas(BusinessMessage) char buffer[sizeof(BusinessMessage)] buffer;
  const BusinessMessage* msg = reinterpret_cast<const BusinessMessage*>(buffer);
  msg->NonConstSetField2(42);
}

And here we get into the whole issue of "When and in what manner does the lifetime of an object start when you cast a char buffer?"

If the cast starts the lifetime of an object of the cast-to type, does it start the liftime of a BusinessMessage or a const BusinessMessage? And if the latter, does this mean the storage location is considered const, because the object that lives there is const, or is it considered mutable because the underlying char buffer is mutable?

I don't think this question is entirely settled.

That said, I also think the question is irrelevant. As I said in the beginning, you're already pretty much relying on the compiler doing what you expect it to do, and in light of this, I don't think wondering about the exact standard semantics of your code matters. Realistically, the code will work. It passes UBsan, which means it won't interfere with your UBsan usage. Write a unit test that tests that it really works (i.e. the changed value can be observed after the function call) and call it a day.

like image 198
Sebastian Redl Avatar answered Oct 18 '25 05:10

Sebastian Redl


There are situations where the initial cast to const BusinessMessage* is defined behaviour. This is because of implicit object creation. When that is the case, msg can still point to a mutable BusinessMessage.

Objects of implicit-lifetime types can also be implicitly created by

  • operations that begin lifetime of an array of type unsigned char or std::byte, in which case such objects are created in the array,

  • call to following object representation copying functions, in which case such objects are created in the destination region of storage or the result:

    • std::memcpy

    • std::memmove

  • call to following specific functions, in which case such objects are created in the specified region of storage:

    • std::start_lifetime_as

    • std::start_lifetime_as_array

  • [Other cases not relevant here]

Zero or more objects may be created in the same region of storage, as long as doing so would give the program defined behavior. If such creation is impossible, e.g. due to conflicting operations, the behavior of the program is undefined. If multiple such sets of implicitly created objects would give the program defined behavior, it is unspecified which such set of objects is created. In other words, implicitly created objects are not required to be uniquely defined.

After implicitly creating objects within a specified region of storage, some operations produce a pointer to a suitable created object. The suitable created object has the same address as the region of storage. Likewise, the behavior is undefined if only if no such pointer value can give the program defined behavior, and it is unspecified which pointer value is produced if there are multiple values giving the program defined behavior.

Because an object of a set of multiple types can be implicitly created in the same region of storage, msg can point to an object that is either const or mutable for the parts of code that don't modify it, and mutable for parts of code that do modify it.

It's very likely that one of the above bullet points happens within the network code, but to be completely sure you could change the cast to a call to std::start_lifetime_as, if you have a compiler that supports features introduced in C++23.

like image 33
Caleth Avatar answered Oct 18 '25 06:10

Caleth



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!