int main(){
int v = 1;
char* ptr = reinterpret_cast<char*>(&v);
char r = *ptr; //#1
}
In this snippet, the expression ptr point to an object of type int, as per:
expr.static.cast#13
Otherwise, the pointer value is unchanged by the conversion.
Indirection ptr
will result in a glvalue that denotes the object ptr
point to, as per
expr.unary#op-1
the result is an lvalue referring to the object or function to which the expression points.
Access an object by using a glvalue of the permitted type does not result in UB, as per
basic.lval#11
If a program attempts to access ([defns.access]) the stored value of an object through a glvalue whose type is not similar ([conv.qual]) to one of the following types the behavior is undefined:
- a char, unsigned char, or std::byte type.
It seems it also does not violate the following rule:
expr#pre-4
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
Assume the width of char
in the test circumstance is 8 bits, its range is [-128, 127]. The value of v
is 1. So, Does it mean the snippet at #1
does not result in UB?
As a contrast, given the following example
int main(){
int v = 2147483647; // or any value greater than 127
char* ptr = reinterpret_cast<char*>(&v);
char r = *ptr; //#2
}
#2
would be UB, Right?
Does it mean the snippet at #1 does not result in UB?
Yes, the quoted rules mean that #1 is well defined.
#2 would be UB, Right?
No, as per the quoted rules, the behaviour of #2 is also well defined.
The type of ptr
is char*
, therefore the type of the expression *ptr
is char
whose value cannot exceed the value representable by char
, thus expr#pre-4 does not apply.
Assume the width of char in the test circumstance is 8 bits, its range is [-128, 127].
This assumption is not necessary in order for #1 to be well defined.
The value of v is 1
This does not follow from the above assumption alone. It may be practically true in case of a little endian CPU (including the previous assumptions) although the standard doesn't specify the representation exactly.
It is the intention of the language that both snippets be implementation defined. I believe they were, until to C++17 which broke support for that language feature. See the defect report here. As far as I know, this has not been fixed in C++20.
Currently, the portable workaround for accessing memory representation is to use std::memcpy
(example) :
#include <cstring>
char foo(int v){
return *reinterpret_cast<char*>(&v);
}
char bar(int v)
{
char buffer[sizeof(v)];
std::memcpy(buffer, &v, sizeof(v));
return *buffer;
}
foo
is technically UB while bar
is well defined. The reason is foo
is UB is by omission. Anything the standard fails to define is by definition UB and the standard, in its current state, fails to define the behavior of this code.
bar
produces the same assembly as foo
with gcc 10. For simple cases, the actual copy is optimized out.
Regarding your rational, the reasoning seems sound except that, in my opinion, the rules defining unary operator*
(expr.static.cast#13) doesn't have the effect you expect in this case. The pointer must point to the underlying representation, which is poorly defined as the linked defect describes. The fact that the pointer's value doesn't change does not mitigate the fact that it points to a different object. C++ allows objects to have the same address if their types are different, such as the first member in a standard layout class sharing the same address as the owning instance.
Note that the author is the defect report came to the same conclusion as you regarding snippet #1, but I disagree. But due to the fact that we are dealing with a language defect, and one that conflicts with state intentions, it is hard to definitively prove one behavior correct. The fundamental rules these arguments would be based on are known to be flawed in this particular case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With