Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it legal to cast a repr(C) struct pointer to pointer to its first field?

In C, this is legal as far as I know (In C, does a pointer to a structure always point to its first member?).

#include <stdio.h>

typedef struct {
    char *name;
    int age;
} A;

typedef struct  {
    int id;
    float x;
} B;

typedef struct {
    A a;
    B b;
} Compound;

int main() {
    Compound c;
    c.a.name = "1234";
    c.a.age = 100;
    c.b.id = 10;
    c.b.x = 100.;

    A *a = &c;
    a->age = 50;

    printf("%s %d %d %f\n", c.a.name, c.a.age, c.b.id, c.b.x);
}

But is it legal in Rust?

struct A {
    name: String,
    age: usize,
}

struct B {
    id: usize,
    x: f32,
}

#[repr(C)]
struct Compound {
    a: A,
    b: B,
}

pub fn run() {
    let a = A {
        name: "1234".to_string(),
        age: 100,
    };

    let b = B { id: 10, x: 100. };

    let mut c = Compound { a, b };

    let a_ = &mut c as *mut Compound as *mut A;
    unsafe {
        (*a_).age = 50; // is this UB?
    }

    println!("{} {} {} {}", c.a.name, c.a.age, c.b.id, c.b.x);
}
like image 372
Malyutin Egor Avatar asked Nov 14 '25 15:11

Malyutin Egor


1 Answers

(tl;dr: Yes, this particular case is legal. The rest of the answer goes into why, and what the exact requirements are. Note that going the other way is undefined behaviour if done naively, although it is possible to make it work if you take sufficient precautions.)

As you hint in the question title, this sort of struct-to-field transmute is only sound on #[repr(C)] structs, because you have to ensure that the field you're reading happens to be placed right at the start of the struct with no padding. It doesn't work on structs without an explicit #[repr(C)], because Rust's default layout rules don't guarantee any particular order for the fields and the padding and thus the field might be in the wrong place; and it doesn't work on enums even with #[repr(C)] because those enums have an extra field that tracks which variant the enum has, which messes up the layout. (The Nomicon entry on transmuting, while a bit out of date, goes over some of these points.)

Although C and C++ have a "strict aliasing rule" which prevents you accessing the same memory using two different types (and thus need a specific exception to allow you to access a struct field using a pointer to the struct itself), Rust's aliasing rules are different, and don't generally disallow accessing memory based on the type of the reference/pointer used. That means that this isn't a special case for Rust – the access is sound as long as a) the pointer or reference you use is actually allowed to access that piece of memory, and b) the memory that is referenced contains a valid bit pattern and provenance for the type you're accessing it as. I'll go into more detail on both these points later.

That means that there are some things you can do in Rust which would be illegal in C or C++:

// legal in Rust, illegal in C and C++
let mut f = 12345_f64;
let f_ = &raw mut f as *mut u64;
// Safety: `f_` points to 8 bytes of aligned mutable memory that it can mutate,
// and the set of legal `f64` and `u64` bit patterns is the same
unsafe { *f_ |= 1; };
println!("{}", f);

In practice you wouldn't actually write it like this (f64::from_bits is a safe wrapper around the same operation), but it shows the principle.

There are also some rules that are the same in Rust as in C and C++, e.g. each reference and pointer has a range of memory it's allowed to access and can't access memory outside that range.

The important case to check to answer your question is the restrictions that exist in Rust, but not C or C++. Rust has two such restrictions which are different from the corresponding restrictions in C or C++, so it's worth looking at them in detail to make sure they don't affect this case:

If a reference exists to memory, it prevents certain acccesses to that memory

The strict aliasing rule from C and C++ primarily exists for performance reasons: being able to do reasoning like "these two memory accesses must access different addresses, so I can do them in the opposite order" is very beneficial for compilers. The reason is that if you have a loop where each iteration reads from memory and then writes to memory (which is a very common way to write a loop), the compiler would like to be able to batch up the loop iterations, by doing, e.g., 8 reads, then 8 calculations, then 8 writes, then repeating (this can give massive performance gains because most modern processors have instructions for doing several simultaneous reads, writes, or calculations on independent addresses). Sometimes the compiler will therefore go to the lengths of generating code that checks whether or not the addresses clash and chooses between two different versions of the loop as a consequence, but ideally it would prefer to be able to statically prove that the loop iterations can be interleaved like that and avoid the overhead of the check (as well as the overhead of needing to generate two versions of the loop).

Rust doesn't have a type-based alias analysis, so it needs to use a different rule instead in order to be able to do the same optimisation. This is done by placing aliasing restrictions on references, which are sufficient to allow for most of the same optimisations that C and C++ allow (via a different method), plus a bunch more:

  • If a shared reference to a piece of memory exists, that memory cannot be changed for as long as that reference exists, with the exception of locations within the memory that the shared reference considers to have a cell type (Cell, RefCell, UnsafeCell, etc.).
  • If a mutable reference to a piece of memory exists, that memory cannot be read or written for as long as that reference exists, unless the read or write goes via that reference, or a reference or pointer based on it (via casting, sharing or reborrowing).

This means that people writing unsafe Rust are often terrified of creating references unless they really have to – every reference that points at the memory you manipulate with your unsafe code is one more requirement that you have to check! As such, it's unidiomatic to write &mut c as *mut Compound as *mut A like you did in your example; this would be better written as &raw mut c as *mut A, which is shorter and avoids creating a &mut c reference that you have to check for aliasing violations. (That said, your original code nonetheless doesn't violate this rule, because the pointer a_ is indeed based on the &mut c reference and thus doesn't violate the rule.)

It can be verified that the struct-pointer-to-field-pointer conversion you asked about can't possibly break this rule: if an access using the struct pointer is legal, an access using the field pointer must be legal too, because it's based on the same references (if any) due to being based on the struct pointer, and accesses a subset of the memory. (Going the other way is more dangerous because it can't use the "accesses a subset of the memory" argument as a soundness proof, so you need to prove it sound some other way instead.)

You can only reinterpret memory containing one type as containing another if it has a valid bit pattern and a valid provenance for that type

Sometimes people think of Rust's "memory is untyped" rule as meaning "memory is just bits", but that's not actually true. As seen in the previous section, when dealing with pointers and references, Rust cares about "where the pointer value came from", i.e. which pointers are based on which other pointers. That information isn't, of course, generally stored in the hardware at runtime (although there are a few platforms on which Rust runs that store an approximation of it and will trap if their approximation is sufficient to notice an aliasing rule being broken). But from the compiler's point of view, it's conceptually a piece of extra information that is stored in the same memory that's storing the value's bit pattern, and it optimises on that basis. (You can think of it like putting labels on the bits to track where they came from.) This information is called "provenance" in the Rust's documentation. (Provenance exists in C and C++ too, although it works a little differently from the Rust version.)

This means that although Rust normally allows you to read memory using a different type that you used to write it, doing so does not work if the types use provenance differently. At in current Rust, only pointers and references have provenance, and the way they use provenance is compatible with each other (e.g. if you store a reference into memory and read it back as a pointer, the pointer will have the same provenance that the reference had, and can be used to access the same memory). But other types don't have provenance, meaning that some information can get lost along the way if you repeatedly type-pun memory. The classic example is that if you try to read an integer from memory as a pointer, you end up with a pointer with no provenance (because the integer had no provenance), so attempting to dereference it is undefined behaviour. (A good way to remember this is "to convert between integers and pointers, use the library functions, don't transmute" – all the library functions that handle the conversion will explain how to handle the provenance correctly.)

What about reading a #[repr(C)] struct as though it were its first field? It's easy to see that, in this case, the provenance has to match, because a struct stores all the provenance that its fields need in the same place as those fields. So this rule can't end up making this sort of cast illegal either.

In other words, this is legal. Rust mostly allows you to read memory written as one type as though it had a different type, except when it doesn't. But this isn't one of the exceptions. The rules are substantially different from C and C++: in C and C++ this is only legal because there's an exception specifically allowing it to work, whereas in Rust it's legal because none of the exceptions happen to prevent it, so the knowledge from one language doesn't really apply to the other. But in this case, it happens to give the same result.

like image 108
ais523 Avatar answered Nov 17 '25 09:11

ais523



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!