Disassembling CUDA code appears to be a very useful tool, if not the only one in some cases, to understand the behaviour of the compiler as well as of performance metrics.
I would say that unfortunately the documentation available through the CUDA BINARY UTILITIES Application Note does not provide the user with all the tools needed to interpret CUDA assembly instructions or at least I'm not able to infer all the needed information from that document. The "CUDA Handbook" book does not provide any more information than the CUDA BINARY UTILITIES guide. For example, how should I interpret the instructions
ISETP.LT.AND P0, PT, R3, RZ, PT;
and
PSETP.AND.AND P0, PT, !P0, PT, PT;
What does @P0
do before an instruction? Is it an instruction label so that the execution jumps to that label if the predicate register P0
is true? Is there any general way to interpret the CUDA assembly instructions?
Thank you very much.
EDIT FOLLOWING NJUFFA'S COMMENT
I have compiled the following simple kernel
__global__ void test_kernel(float *a, float *b)
{
int tid = threadIdx.x + blockIdx.x * blockDim.x;
if ((tid > 5) && (tid < 10)) a[tid] = tid;
else b[tid] = tid;
}
which resulted in
/*0000*/ MOV R1, c[0x1][0x100]; /* 0x2800440400005de4 */
/*0008*/ S2R R0, SR_CTAID.X; /* 0x2c00000094001c04 */
/*0010*/ S2R R2, SR_TID.X; /* 0x2c00000084009c04 */
/*0018*/ IMAD R2, R0, c[0x0][0x8], R2; /* 0x2004400020009ca3 */
/*0020*/ IADD R0, R2, -0x6; /* 0x4800ffffe8201c03 */
/*0028*/ ISETP.LT.U32.AND P0, PT, R0, 0x4, PT; /* 0x188ec0001001dc03 */
/*0030*/ I2F.F32.S32 R0, R2; /* 0x1800000009201e04 */
/*0038*/ @!P0 ISCADD R3, R2, c[0x0][0x24], 0x2; /* 0x400040009020e043 */
/*0040*/ @P0 ISCADD R2, R2, c[0x0][0x20], 0x2; /* 0x4000400080208043 */
/*0048*/ @!P0 ST [R3], R0; /* 0x9000000000302085 */
/*0050*/ @P0 ST [R2], R0; /* 0x9000000000200085 */
/*0058*/ EXIT ; /* 0x8000000000001de7 */
The compiler has recast the condition ((tid > 5) && (tid < 10))
into ((i < 4) && (i >= 0))
, with i = tid - 6
, so that the involved instruction is now
ISETP.LT.U32.AND P0, PT, R0, 0x4, PT;
@P0
before an instruction conditionally executes the instruction if predicate register 0 is true. Likewise, @!P0
before an instructions means the instruction is conditionally executed if predicate register 0 is false. When disassembling more complex machine code, you will see that there are typically multiple predicate registers in use. This predication mechanism is also used for conditional branching, by predicating the BRA
instruction.
ISETP
is an integer comparison (here: LT
= less than) with the result written to a predicate register. It allows for chaining of predicates which is useful for compound branches. In your example chaining is not used since the compiler used a clever transformation that allows the compound condition to be evaluated with a single ISETP
. Here, the chaining operator is AND
, and the predicate generated by ISETP
is chained with PT
(= true). I am not sure what the significance of the second instance of PT
is, you could probably find out by examining additional usage examples.
PSETP
works along similar lines as ISETP
, but works on predicates rather than integers. I have not had a need to look at this instruction closely, as it does not seem to occur very often. From what I can tell, PSETP
combines two predicate registers and stores the result to a predicate register. Here it combines !P0
and PT
(= true) via AND
. It seems that this instruction also supports chaining, in this case using AND
to chain with PT
. Your interpretation that your example represents a logical negation P0 = !P0
seems correct. As in the case of ISETP
, I am not sure what the meaning of the third PT
could be.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With