I am profiling my CUDA kernel using Visual Profiler 6.0 and on nearly every line there is a bar which shows percentages of Inactive threads and Predicated off threads.
I was wondering what exactly those two values mean and how 'bad' are they?
As far as I know, the Inactive threads (shown in red) are threads that diverged and are inactive (due to some if statement) and Predicated off threads (shown in blue) are correctly predicated by compiler to be inactive. Is that correct?
If that is true, I do not understand why following bunch of lines in my kernel has 95% of inactive threads, the only ifs are the loops:

The TFloat is template for either float or double type. What is causing the thread inactivity there?
I am using CUDA 6.0 and the code is running on Tesla K40c under compute capability 3.5.
NVIDIA GPUs execute warps of 32 parallel threads using SIMT, which enables each thread to access its own registers, to load and store from divergent addresses, and to follow divergent control flow paths.
A warp is a set of 32 threads within a thread block such that all the threads in a warp execute the same instruction. These threads are selected serially by the SM. Once a thread block is launched on a multiprocessor (SM), all of its warps are resident until their execution finishes.
The kernel is a function executed on the GPU. Every CUDA kernel starts with a __global__ declaration specifier. Programmers provide a unique global ID to each thread by using built-in variables. Figure 2. CUDA kernels are subdivided into blocks.
In order to launch a CUDA kernel we need to specify the block dimension and the grid dimension from the host code. I'll consider the same Hello World! code considered in the previous article. In the above code, to launch the CUDA kernel two 1's are initialised between the angle brackets.
From the following link:
There are two reasons threads within a warp can be disabled: being inactive, and being predicated off. If the block size is not a multiple of the warp size, the last warp in the block will have inactive threads. When some threads within a warp exit the kernel while others continue, the exiting threads become inactive. Threads become predicated off when divergent branches occur, because the separate paths taken by the threads must be serialized, and threads are disabled for paths they do not take.
So it looks like your dimensionsCount is zero (or close) on most of the threads, and they exit before a few other threads are still computing.
On the other hand, "predicated off" may be recorded when the the actual branching condition is hit - some thread jump to exit (but still active!), others jump to looping. This is also suggested by the SASS code on the right at your snapshot: the only blue bar appears at BRA instruction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With