An article called The joys and perils of C and C++ aliasing, Part 1 offers this example:
int foo(int *a, long *b)
{
int t = *a;
*b = 0; // cannot change *a
return *a - t; // can be folded to zero
}
With the commentary:
Because
a
andb
are declared to be pointers to incompatible types, and because C and C++ require that objects have their stored value accessed only by an lvalue of a compatible type, the store to*b
cannot affect the value of*a
cached in the variablet
, which is typically a register. Therefore, the operands in the subtraction expression must be equal, and the result must be zero.
But in Part 2, we see a similar example:
int bar(int *a, long *b)
{
int t = *a;
for (int i = 0; i != sizeof *b; ++i)
((unsigned char*)b)[i] = 0;
return *a - t; // must not be folded
}
where folding is not possible:
In this case, the compiler cannot fold the
return
expression because the function would be valid if called withb
equal toa
. However, usingrestrict
when declaring thea
andb
pointers would make callingf
with overlapping objects invalid. Doing that would re-enable the optimization opportunity.
I understand that an unsigned char *
can be used for aliasing the same way that a char *
could. But my understanding is that a
and b
already signal that they will not alias in both signatures, because one is a int *
and one is a long *
.
My mental model is that the function's argument types are subject to strict aliasing rules, so that when either...
a
and b
occupy overlapping memory (e.g. via reinterpret_cast) and we're in UB territorya
and b
have non-overlapping representations in memory...type-aware optimizations (e.g. folding of return statement) can still be applied. I assume this applies equally for foo
and bar
.
Why does the use of type punning inside of bar
change whether the compiler can fold the *a - t
expression?
The type aliasing rule has a specific exception for accessing an object of any type through a glvalue of type unsigned char
(as well as char
and std::byte
). For these types and only these type there is no undefined behavior under the aliasing rule that the compiler could exploit for optimization. C has similar exceptions to the aliasing rule, but it is specified differently in terms of the object model.
If a
and b
represent the same address, then either *b = 0
or int t = *a;
must have undefined behavior because there can only either be an int
or a long
object alive at that address and only that one can be accessed. The other access would either attempt to access an out-of-lifetime object or attempt to access an object of one type through a glvalue of a different type, violating the type aliasing rule. One or the other would have to cause undefined behavior and the compiler can therefore assume that this situation, i.e. that a
and b
represent the same address, can't happen. (Neither can the memory ranges for the int
and long
objects pointed to overlap for that matter, because an int
object and a long
object can't both be alive in overlapping storage at the same time.)
In the second example *b = 0
is replaced by ((unsigned char*)b)[i] = 0
. The pointer b
is never accessed as long
type, only as unsigned char
. Because of the exception mentioned above, this is not UB per the aliasing rules, if a
and b
both point to the same object of type int
that is the (only) object alive at the address they both represent. Therefore the compiler cannot assume that a
and b
represent different addresses.
That being said, the C++ standard lacks any actual definition of how the access through the unsigned char
glvalue or the pointer arithmetic on the unsigned char*
pointer should behave, so that reading the standard pedantically it is still UB regardless.
However, the usual assumption is that it should behave as if the unsigned char*
pointer result of the cast was a pointer into the object representation of the original object, considered as an array of unsigned char
. This still leaves a lot of things unclear in alot of edge cases, but explains how the common usage usually works as intended.
With that understanding ((unsigned char*)b)[i] = 0
would directly modify the object representation of the int
object that a
points to in the valid scenario mentioned above. Changing the object representation of the int
object means changing its value and therefore the compiler cannot assume that the latter read of *a
will produce the same value as the one before the write and must instead reload it again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With