GPGPU: an effective way to handle 'irregular' transform?

Question

On regular transform, every GPU-threads are expected to have same time complexity O. For example:

for i=0 to 10: c[i] = a[i]*b[i]

On irregular transform, it isn't:

for i=0 to len(arr)
    for k=0 to random()%100
        arr[i] += 1

which results an array like [2,50,32,77,1,5,66, ...] where each element indicates, roughly, a computational cost.

GPGPU programming is well suited to regular transforms like 'element-wise addition', 'matrix-multiplication', 'convolution', ... But how about irregular transforms? How to 'well' distribute GPU-threads? How to design a 'good' kernel? Is there a common methodology?

huseyin tugrul buyukisik · Accepted Answer

If hardware is not Vega nor Volta (both can have nearly independent command execution per item), then your best bet is to re-group suspicious works together. For example, a mandelbrot image generator(different amounts of work per item) can be faster with 2D tiled generation since all items in same group can have more or less same amount of work neighbour workitems and more balanced than a 1-D (scanline) generation (which has more divergent result per group). Eirther you should re-order elements depending on the last iteration or use a spatial grouping.

On worst case, max cycles per compute unit(each having 8,64,128,192 cores) determines resulting performance which will be faster with more compute units. But all other workitems work will still be hidden behind those max cycles and be more efficient than a CPU.

GPGPU: an effective way to handle 'irregular' transform?

Tags:

cuda

gpgpu

opencl

Laie

1 Answers

huseyin tugrul buyukisik

Recent Activity

Donate For Us

GPGPU: an effective way to handle 'irregular' transform?

Tags:

cuda

gpgpu

opencl

Laie

1 Answers

huseyin tugrul buyukisik

Related questions

Recent Activity

Donate For Us