I need to know something about CUDA shared memory. Let's say I assign 50 blocks with 10 threads per block in a G80 card. Each SM processor of a G80 can handle 8 blocks simultaneously. Assume that, after doing some calculations, the shared memory is fully occupied.
What will be the values in shared memory when the next 8 new blocks arrive? Will the previous values reside there? Or will the previous values be copied to global memory and the shared memory refreshed for next 8 blocks?
It states about the type qualifiers:
__device__ __shared__ type variable in shared memory for a block, only stays in kernel__device__ type variable in global memory for a grid, stays until the application exits__device__ __constant__ type variable for a grid, stays until the application exitsthus from this reference, the answer to your question is the memory should be refreshed for the next 8 blocks if they reside in shared memory of your device.
For kernel blocks, the execution order and SMs are randomly assigned. In that sense, even if the old value or address preserves, it is hard to keep things in track. I doubt there is even a way to do that. Communication between blocks are done via off chip memory. The latency associated with off chip memory is the performance killer, which makes gpu programming tricky. In Fermi cards, blocks share some L2 cache, but one can't alter the behavior of these caches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With