In an OpenCL kernel, is it slower to call get_local_id(0) several times, instead of calling it once at the top - and later accessing that value from a local variable?
Do I use one fewer register by not storing get_local_id(0) to a local variable, and instead calling it several times?
On most GPU architectures the local work item id will be held in a special set of registers that are set up by the hardware as it dispatches threads. This means that when you do a get_local_id(0) you are actually just reading from a register anyway, i.e. no speed loss 'calling' get_local_id(0).
The compiler should notice if you assign get_local_id(0) to a variable for subsequent use and generate the same code as if you'd get_local_id(0) every time. However, in my experience this is not always the case and you will notice exactly one register being saved by not using the variable to store the local id.
If you are tight for registers, I would suggest calling get_local_id(0) every time. You could even set up a macro to hide whether get_local_id(0) is being called or a register is being used. If you've got registers to spare (your profiler will tell you this) or prefer your code using a variable then go with a variable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With