Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I map multiple threads to array elements with a different number of threads for different elements using CUDA?

I have some arrays (actually thrust::device_vector) -

MyClass1 a[N];
int b[N];

that I have to use to fill

MyClass2 c[M];

a[0] generates the first b[0] elements in c[]. a[1] generates the next b[1] elements in c[], and so on. To be clear, M = b[0] + b[1] + ... + b[N-1]. Any b[i] can be zero. I need to create c[] and then process all its M elements in parallel using M threads.

Is there a way of writing a kernel that uses M threads, where each thread knows (or finds out in O(1) time) which corresponding MyClass1 in a[] it is associated to?

For example, if I have N=4 and

int b[4] = {2,0,3,5};

Then M=2+0+3+5= 10, so c[] is of size 10 and I need 10 threads (tid=0...9) to process c[] in parallel, where tid=i creates c[i]. Here,

  • c[0] and c[1] will be created using a[0], because b[0] == 2.
  • No elements will be created using a[1], because b[1] == 0.
  • c[2] to c[4] will be created using a[2].
  • c[5] to c[9] will be created using a[3].

The thread tid=3 knows it's supposed to create c[3], but how would it know it's supposed to use a[2] and generate its second element to do so?.

like image 917
Chaitanya Andhare Avatar asked Jan 23 '26 21:01

Chaitanya Andhare


1 Answers

I would suggest summing up b[] to B[] which has elements

B[i] = sum(b[0] ... b[i])

Therefore M = B[N-1], you now launch M lanes where for each you use std::lower_bound to find and index i of the element in a[] which you want to use.

(I would probably do the first step on the cpu, you are probably doing something similar already to determine M.

pidx - B[i] will tell you the "local" index of the current element you are constructing)

like image 50
hhergeth Avatar answered Jan 26 '26 12:01

hhergeth