How can I know the amount of shared memory available on my GPU?
I'm interested in how big arrays I can store in my shared memory. My GPU is a Nvidia GeForce 650 Ti. I am using VS2013 with the CUDA toolkit for coding.
I would really appreciate if someone could explain, how I can figure it out by myself, not only give a raw number.
Two ways:
read the documentation (programming guide). Your GeForce 650 Ti is cc3.0 GPU. (If you want to learn how to discover that, there is documentation or read item 2).
For a cc3.0 GPU, it is a maximum of 48KB per threadblock.
Programmatically, by running cudaGetDeviceProperties
(documentation). The cuda sample app deviceQuery demonstrates this.
EDIT: responding to the question below.
The 48KB limit per threadblock is a logical limit as seen from the perspective of kernel code. There are at least two other numbers:
Total amount of shared memory per SM (this is also listed in the documentation (same as above) and available via cudaGetDeviceProperties
(same as above).) For a cc3.0 GPU this is again 48KB. This will be one limit to occupancy; this particular limit being the total available per SM divided by the amount used by a threadblock. If your threadblock uses 40KB of shared memory, you can have at most 1 threadblock resident per SM, at a time, on a cc3.0 GPU. If your threadblock uses 20KB of shared memory, you could possibly have 2 threadblocks resident per SM, ignoring other limits to occupancy.
Total amount per device/GPU. I consider this to be a less relevant/useful number. It is equal to the total number of SMs on your GPU multiplied by the total amount per SM. This number is not particularly meaningful, i.e. it does not communicate new information beyond the knowledge of the number of SMs on your GPU. I can't really think of a use for this number, at the moment.
SM as used above means "streaming multiprocessor" which is identified here. It is also just referred to as "multiprocessor", for example in the table 12 I linked above.
Various newer GPUs have the ability to exceed the 48KB per threadblock limit. See here for example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With