Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where does the kernel parameter data resides?

As in title, in cuda programs, where does the kernel parameter resides after kernel launch, in local memory or global memory of GPU?

For example, in LLVM IR of a cuda program:

__global__ kernel(int param1):

%0 = alloca int

store param1, %0

So, in this case, where does %0 point to? local memory or global memory?

Also, I saw sometimes kernel parameters are held and use directly in registers instead of storing it in any memory. How this decision is made?

like image 255
cache Avatar asked Jun 22 '26 23:06

cache


1 Answers

As Robert Corvella in his comment pointed out: parameters are stored in contant memory of the GPU.

However, doing an alloca and a store of param1 to the allocated space moves copies the parameter from the constant memory to local memory. alloca instructions are lowerd to stack allocations in PTX code. In clang, this is the canonical way to handle function parameters during code generation. However, on GPUs this can (since PTX is optimized during lowering to SASS just saying: can) lead to a performance penalty because local memory goes through all cache levels down to global memory and is much slower than constant memory.

In LLVM you have the mem2reg optimizer pass. This pass promotes all memory allocations on the stack to registers. In the case of kernel parameters you most likely want this optimization. The alloca and the store instructions disappear form your IR and the parameter will be used directly instead of an unnecessary copy.

like image 120
Michael Haidl Avatar answered Jun 24 '26 16:06

Michael Haidl