Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GPGPU: an effective way to handle 'irregular' transform?

Tags:

cuda

gpgpu

opencl

On regular transform, every GPU-threads are expected to have same time complexity O. For example:

for i=0 to 10: c[i] = a[i]*b[i]

On irregular transform, it isn't:

for i=0 to len(arr)
    for k=0 to random()%100
        arr[i] += 1

which results an array like [2,50,32,77,1,5,66, ...] where each element indicates, roughly, a computational cost.

GPGPU programming is well suited to regular transforms like 'element-wise addition', 'matrix-multiplication', 'convolution', ... But how about irregular transforms? How to 'well' distribute GPU-threads? How to design a 'good' kernel? Is there a common methodology?

like image 968
Laie Avatar asked Feb 02 '26 00:02

Laie


1 Answers

If hardware is not Vega nor Volta (both can have nearly independent command execution per item), then your best bet is to re-group suspicious works together. For example, a mandelbrot image generator(different amounts of work per item) can be faster with 2D tiled generation since all items in same group can have more or less same amount of work neighbour workitems and more balanced than a 1-D (scanline) generation (which has more divergent result per group). Eirther you should re-order elements depending on the last iteration or use a spatial grouping.

On worst case, max cycles per compute unit(each having 8,64,128,192 cores) determines resulting performance which will be faster with more compute units. But all other workitems work will still be hidden behind those max cycles and be more efficient than a CPU.

like image 159
huseyin tugrul buyukisik Avatar answered Feb 03 '26 15:02

huseyin tugrul buyukisik