Why do we need stride in CUDA kernel?

Question

I was wondering, why do one need to use a grid-stride stride in the following loop:

for (int i = index; i < ITERATIONS; i =+ stride)
{
  C[i] = A[i] + B[i];
}

Where we set stride and index to:

index = blockIdx.x * blockDim.x + threadIdx.x;
stride = blockDim.x * gridDim.x;

When calling kernel we have this:

int blockSize = 5;
int ITERATIONS = 20;
int numBlocks = (ITERATIONS + blockSize - 1) / blockSize;
bench<<<numBlocks, blockSize>>>(A, B, C);

So when we launch the kernel we will have blockDim.x = 5 and gridDim = 4 and there for stride will be equal 20.

My point is that, whenever one uses such approach, stride will always be equal or bigger than number of elements in calculation, so every time when it will come to increment loop will be over.

And here is the question, why one need to use loop or stride at all, why just not to run with index, like this?:

index = blockIdx.x * blockDim.x + threadIdx.x;
C[index] = A[index] + B[index];

And another question, how can I now, in this particular case, how many thread is running on my GPU simultaneously before give a “jump” to another portion of a very big array (ex. 2000000)?

2 revstalonmies · Accepted Answer

My point is that, whenever one uses such approach, stride will always be equal or bigger than number of elements in calculation, so every time when it will come to increment loop will be over.

There lies the problem with your understanding. To use that kernel effectively, you only need to run as many blocks as will achieve maximal device wide occupancy for your device, not as many blocks as are required to process all your data. Those fewer blocks then become "resident" and process more than one input/output pair per thread. The grid stride also preserves whatever memory coalescing and cache coherency properties the kernel might have.

By doing this, you eliminate overhead from scheduling and retiring blocks. There can be considerable efficiency gains in simple kernels by doing so. There is no other reason for this design pattern.

Why do we need stride in CUDA kernel?

Tags:

cuda

Igor Kolesnikov

1 Answers

2 revstalonmies

Recent Activity

Donate For Us

Why do we need stride in CUDA kernel?

Tags:

cuda

Igor Kolesnikov

1 Answers

2 revstalonmies

Related questions

Recent Activity

Donate For Us