On the host side, I can save the amount of dynamic shared memory I intend to launch a kernel with, and use it. I can even pass that as an argument to the kernel. But - is there a way to get it directly from device code, without help from the host side? That is, have the code for a kernel determine, as it runs, how much dynamic shared memory it has available?
Yes, there's a special register holding that value, named %dynamic_smem_size
. You can obtain this register's value in your CUDA C/C++ code by wrapping some inline PTX with a getter function:
__device__ unsigned dynamic_smem_size()
{
unsigned ret;
asm volatile ("mov.u32 %0, %dynamic_smem_size;" : "=r"(ret));
return ret;
}
You can similarly obtain the total size of allocated shared memory (static + dynamic) from the register %total_smem_size
.
Note: reading a special register does cost a few cycles. If you can compute this value apriori, you'll save that read, plus you might squeeze something out of the optimzier knowing that value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With