Recently I have been using thrust a lot. I have noticed that in order to use thrust, one must always copy the data from the cpu memory to the gpu memory.
 Let's see the following example :  
int foo(int *foo)
{
     host_vector<int> m(foo, foo+ 100000);
     device_vector<int> s = m;
}
I'm not quite sure how the host_vector constructor works, but it seems like I'm copying the initial data, coming from *foo, twice - once to the host_vector when it is initialized, and another time when device_vector is initialized. Is there a better way of copying from cpu to gpu without making an intermediate data copies? I know I can use device_ptras a wrapper, but that still doesn't fix my problem.
thanks!
One of device_vector's constructors takes a range of elements specified by two iterators. It's smart enough to understand the raw pointer in your example, so you can construct a device_vector directly and avoid the temporary host_vector:
void my_function_taking_host_ptr(int *raw_ptr, size_t n)
{
  // device_vector assumes raw_ptrs point to system memory
  thrust::device_vector<int> vec(raw_ptr, raw_ptr + n);
  ...
}
If your raw pointer points to CUDA memory, introduce a device_ptr:
void my_function_taking_cuda_ptr(int *raw_ptr, size_t n)
{
  // wrap raw_ptr before passing to device_vector
  thrust::device_ptr<int> d_ptr(raw_ptr);
  thrust::device_vector<int> vec(d_ptr, d_ptr + n);
  ...
}
Using a device_ptr doesn't allocate any storage; it just encodes the location of the pointer in the type system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With