Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are constexpr device variables accessible from host?

My colleague came across this situation where global __device__ constexpr variables are accessible from both the host and the device.

#include <array>
#include <cstdio>

__device__ constexpr std::array<int, 2> a = {15, 1};

__host__ __device__ void my_print(const char* c_str, const int& i) {
    std::printf(c_str);
    std::printf(" i=%d &i=%p\n", i, &i);
}

__global__ void display() {
    my_print("device", a[0]);
}

int main() {
    display<<<1, 1>>>();
    cudaDeviceSynchronize();
    my_print("host", a[0]);
    return 0;
}

Here is the output from a run, which looks reasonable:

device i=15 &i=0x7e97e9200000
host i=15 &i=0x482008

Our situation is related to this question. However, unlike that case, we know the variable's value at compile time. Is the current situation well-defined?

Update: Thanks for the discussion in the comments. We understand that passing a constexpr device variable as const ref to a host should be treated as a bug by the CUDA compiler and it has been reported. We also understand that a global constexpr host variable is enough as a variable that should be accessed by both host and device. It is somewhat unfortunate that constexpr host variables cannot be passed as const ref to the device. Most likely nothing could go wrong.

like image 444
Hari Avatar asked Sep 15 '25 12:09

Hari


1 Answers

Indeed, as @Johan says in a comment, this is a bug.

The bug has two aspects, actually:

  • A __device__-side global variable exists on every device on which the code with that variable is used (or more exactly, in every context into which the variable's module is loaded; see §18.2 of the Programming Guide). In other words - when a device runs your code, a is well-defined; but on the host side - there may be many a's, so - which a's address could you take, at all?

  • Even supposing there were just one device (and one context), and the choice of a on the host was somehow made reasonably (e.g. the primary context on the current device) - you should still not be getting different addresses: CUDA, since version 6.0, uses a single unified memory space for all addresses on GPUs and on the host. That means that the address of an entity (like a variable or function) is the same for device-side code and host-side code - regardless of whether it's visible from both host and device.

I tried try your code with a non-constexpr variable, and of a simpler type (an int x); but I changed it to print the value on all of my (two) CUDA-capable GPUs. The resulting output is:

device 0: int x at address 0x7f826ba00000 has value 123
device 1: int x at address 0x7f826b000000 has value 123
host: int x at address 0x560032e54568 has value 0
host: int y at address 0x560032e544b0 has value 456

so, two variables, on two devices; and yet we get a third address; and it looks very much like the address of another global host-side variable that I also defined.

Here's the program I used:

#include <cstdio>

__device__ int x = 123;
int y = 456;

__host__ __device__ void display_x(int const* dev_idx, int const& x) {
    if (dev_idx) { std::printf("device %d: ", *dev_idx); }
    else { std::printf("host: "); }
    std::printf("int x at address %p has value %d\n", &x, x);
}

// Type your code here, or load an example.
__global__ void display_x_kernel(int dev_idx) {
    display_x(&dev_idx, x);
}

int main() {
    int num_devices = 0;
    cudaGetDeviceCount(&num_devices);
    for(int dev_idx = 0; dev_idx < num_devices; dev_idx++) {
        cudaSetDevice(dev_idx);
        display_x_kernel<<<1, 1>>>(dev_idx);
        cudaDeviceSynchronize();
    }
    display_x(nullptr, x);
    std::printf("host: int y at address %p has value %d\n", &y, y);
}

I'll also take this opportunity to suggest that you avoid __device__ global variables altogether - except for rare case I can't think of right now off the top of my head. (Non-constant) globals are to be avoided generally, and device globals are kind of the lame brothers of host-side-code globals - considering that your program will have to copy the values of these globals, at run-time, to the GPU, anyway. And if you want constexpr values which "disappear", due to only being used at compile-time - well, I would suggest making them local to a consteval function. Maybe it won't make a difference, but frankly - I don't want to have to prove to myself that NVCC or NVRTC or clang will actually get rid of them.

Thanks goes to @paleonix for setting me straight regarding the most salient aspect of this bug.

like image 180
einpoklum Avatar answered Sep 17 '25 01:09

einpoklum