Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why it is not possible to overload host/device member function of a CUDA C++ class [duplicate]

I have a 3d vector class with member functions marked as host and device functions. Below is snippet of one of the member function:

__host__ __device__
double Vector::GetMagReciprocal()
{
    double result = 1/sqrt(x*x + y*y + z*z);
    return result;
}

What I want to achieve is to have separate definition for host and device function so that I can get better performance by using CUDA math intrinsic function rqsrt when executing on device. The way I would do it is to overload this member function for host and device:

__host__
double Vector::GetMagReciprocal()
{
    double result = 1/sqrt(x*x + y*y + z*z);
    return result;
}

__device__
double Vector::GetMagReciprocal()
{
    double result = rsqrt(x*x + y*y + z*z);
    return result;
}

Now when I compile the Vector.cpp file using nvcc(-x cu flag), I get following error

function "Vector::GetMagReciprocal" has already been defined

Now I wonder why NVIDIA doesn't support this sort of overloading.

I can think of alternate ways of achieving the separation, but they have their own issues:

  • create separate member functions for host and device in vector class say GetMagReciprocalHost and GetMagReciprocalDevice and call the appropriate function in host/device code
  • Have a single member function GetMagReciprocal but pass a flag to the member function to choose between host code and device code

Maybe there is another easier way to achieve this. If someone has any suggestions, it will be nice.

REEDITED: I had not mentioned about possibility of conditional compilation using CUDA ARCH flag to generate separate host and device. This was actually the first thing I had done when modifying the member function. But something came to my mind which said this won't work. Perhaps I was wrong about my understanding of usage of this compilation flag. So the answer suugested by sgarizvi is the right answer

like image 568
nurabha Avatar asked Sep 12 '25 10:09

nurabha


1 Answers

You can use conditional compilation flag __CUDA_ARCH__ to generate different codes for host and device in a __host__ __device__ function.

__CUDA_ARCH__ is defined only for device code, so to create different implementation for host and device, you can do the following:

__host__ __device__
double Vector::GetMagReciprocal()
{
    double result;
    #ifdef __CUDA_ARCH__
    result = rsqrt(x*x + y*y + z*z);
    #else
    result = 1/sqrt(x*x + y*y + z*z);
    #endif
    return result;
}
like image 200
sgarizvi Avatar answered Sep 15 '25 02:09

sgarizvi