I believe 6.5p7 in the C standard defines the so-called strict aliasing rule as follows.
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
Here's a simple example that shows GCC's optimization based on its assumption to the rule.
int IF(int *i, float *f) {
*i = -1;
*f = 0;
return *i;
}
IF:
mov DWORD PTR [rdi], -1
mov eax, -1
mov DWORD PTR [rsi], 0x00000000
ret
The load for return *i is omitted assuming that int and float cannot alias.
Then let's consider case 6, where it says an object could be accessed by a character type lvalue expression (char *).
int IC(int *i, char *c) {
*i = -1;
*c = 0;
return *i;
}
IC:
mov DWORD PTR [rdi], -1
mov BYTE PTR [rsi], 0
mov eax, DWORD PTR [rdi]
ret
Now there is a load for return *i because i and c could overlap according to the rules, and *c = 0 could change what's in *i.
Then can we also modify a char through an int *? Should the compiler care that such thing might happen?
char CI(char *c, int *i) {
*c = -1;
*i = 0;
return *c;
}
CI: #GCC
mov BYTE PTR [rdi], -1
mov DWORD PTR [rsi], 0
movzx eax, BYTE PTR [rdi]
ret
CI: #Clang
mov byte ptr [rdi], -1
mov dword ptr [rsi], 0
mov al, byte ptr [rdi]
ret
Looking at the assembly output, both GCC and Clang seem to think a char can be modified by access through int *.
Maybe it's obvious that A and B overlapping means A overlaps B and B overlaps A. However, I found this detailed answer which emphasizes in boldface that,
Note that
may_alias, like thechar*aliasing rule, only goes one way: it is not guaranteed to be safe to useint32_t*to read a__m256. It might not even be safe to usefloat*to read a__m256. Just like it's not safe to dochar buf[1024]; int *p = (int*)buf;.
Now I got really confused. The answer is also about GCC vector types, which has an may_alias attribute so it can alias similarly as a char.
At least, in the following example, GCC seems to think overlapping access can happen in both ways.
int IV(int *i, __m128i *v) {
*i = -1;
*v = _mm_setzero_si128();
return *i;
}
__m128i VI(int *i, __m128i *v) {
*v = _mm_set1_epi32(-1);
*i = 0;
return *v;
}
IV:
pxor xmm0, xmm0
mov DWORD PTR [rdi], -1
movaps XMMWORD PTR [rsi], xmm0
mov eax, DWORD PTR [rdi]
ret
VI:
pcmpeqd xmm0, xmm0
movaps XMMWORD PTR [rsi], xmm0
mov DWORD PTR [rdi], 0
movdqa xmm0, XMMWORD PTR [rsi]
ret
https://godbolt.org/z/ab5EMx3bb
But am I missing something? Is strict aliasing one-way?
Additionally, after reading the current answers and comments, I thought maybe this code is not allowed by the standard.
typedef struct {int i;} S;
S s;
int *p = (int *)&s;
*p = 1;
Note that (int *)&s is different from &s.i. My current interpretation is that an object of type S is being accessed by an lvalue expression of type int, and this case is not listed in 6.5p7.
Yes it's only one way, but from the context of the function it can't tell from which side.
Given this:
char CI(char *c, int *i) {
*c = -1;
*i = 0;
return *c;
}
It could have been called like this:
int a;
char *p = ((char *)&a) + 1;
char b = CI(p,&a);
Which is a valid use of aliasing. So from inside of the function, *i = 0 is correctly setting a in the calling function, and *c = -1 is correctly setting one byte inside of a.
You can take a pointer to any object, cast it to a char* and use that to access the bit patterns underlying said object. You can also cast char* gotten this way back to it's original type.
So when the compiler sees int *i and char *p it can not exclude the possibility that p was created by casting from i. So they may point to the same raw memory. Changing one may change the other. There it goes both ways. But that is not what the text is about.
What this is about is casting from A* to char* and then to B*. The object pointed to doesn't magically become a B and accessing it through a B* is undefined behavior. Maybe one-way is the wrong word. I don't know what to name this better. But for every object there is a train with only 2 stops: A* and char* (unsigned char*, signed char*, const char*, ... and all it's variants). You can go back and forth as many times as you like but you can never change tracks and go to B*.
Does that help?
The may_alias attribute sets up another such rail system. Allowing the alias between int[4] and __m128i* because that is exactly the overlapping the compiler needs for the vectorization. But that's something you have to look up in the compilers specs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With