CUDA assembly instructions

Question

Disassembling CUDA code appears to be a very useful tool, if not the only one in some cases, to understand the behaviour of the compiler as well as of performance metrics.

I would say that unfortunately the documentation available through the CUDA BINARY UTILITIES Application Note does not provide the user with all the tools needed to interpret CUDA assembly instructions or at least I'm not able to infer all the needed information from that document. The "CUDA Handbook" book does not provide any more information than the CUDA BINARY UTILITIES guide. For example, how should I interpret the instructions

ISETP.LT.AND P0, PT, R3, RZ, PT;

and

PSETP.AND.AND P0, PT, !P0, PT, PT;

What does @P0 do before an instruction? Is it an instruction label so that the execution jumps to that label if the predicate register P0 is true? Is there any general way to interpret the CUDA assembly instructions?

Thank you very much.

EDIT FOLLOWING NJUFFA'S COMMENT

I have compiled the following simple kernel

__global__ void test_kernel(float *a, float *b)
{
    int tid = threadIdx.x + blockIdx.x * blockDim.x;
    if ((tid > 5) && (tid < 10)) a[tid] = tid;
    else b[tid] = tid;
}

which resulted in

/*0000*/        MOV R1, c[0x1][0x100];                 /* 0x2800440400005de4 */
/*0008*/        S2R R0, SR_CTAID.X;                    /* 0x2c00000094001c04 */
/*0010*/        S2R R2, SR_TID.X;                      /* 0x2c00000084009c04 */
/*0018*/        IMAD R2, R0, c[0x0][0x8], R2;          /* 0x2004400020009ca3 */
/*0020*/        IADD R0, R2, -0x6;                     /* 0x4800ffffe8201c03 */
/*0028*/        ISETP.LT.U32.AND P0, PT, R0, 0x4, PT;  /* 0x188ec0001001dc03 */
/*0030*/        I2F.F32.S32 R0, R2;                    /* 0x1800000009201e04 */
/*0038*/   @!P0 ISCADD R3, R2, c[0x0][0x24], 0x2;      /* 0x400040009020e043 */
/*0040*/    @P0 ISCADD R2, R2, c[0x0][0x20], 0x2;      /* 0x4000400080208043 */
/*0048*/   @!P0 ST [R3], R0;                           /* 0x9000000000302085 */
/*0050*/    @P0 ST [R2], R0;                           /* 0x9000000000200085 */
/*0058*/        EXIT ;                                 /* 0x8000000000001de7 */

The compiler has recast the condition ((tid > 5) && (tid < 10)) into ((i < 4) && (i >= 0)), with i = tid - 6, so that the involved instruction is now

ISETP.LT.U32.AND P0, PT, R0, 0x4, PT;

njuffa · Accepted Answer

@P0 before an instruction conditionally executes the instruction if predicate register 0 is true. Likewise, @!P0 before an instructions means the instruction is conditionally executed if predicate register 0 is false. When disassembling more complex machine code, you will see that there are typically multiple predicate registers in use. This predication mechanism is also used for conditional branching, by predicating the BRA instruction.

ISETP is an integer comparison (here: LT = less than) with the result written to a predicate register. It allows for chaining of predicates which is useful for compound branches. In your example chaining is not used since the compiler used a clever transformation that allows the compound condition to be evaluated with a single ISETP. Here, the chaining operator is AND, and the predicate generated by ISETP is chained with PT (= true). I am not sure what the significance of the second instance of PT is, you could probably find out by examining additional usage examples.

PSETP works along similar lines as ISETP, but works on predicates rather than integers. I have not had a need to look at this instruction closely, as it does not seem to occur very often. From what I can tell, PSETP combines two predicate registers and stores the result to a predicate register. Here it combines !P0 and PT (= true) via AND. It seems that this instruction also supports chaining, in this case using AND to chain with PT. Your interpretation that your example represents a logical negation P0 = !P0 seems correct. As in the case of ISETP, I am not sure what the meaning of the third PT could be.

CUDA assembly instructions

Tags:

assembly

cuda

Vitality

1 Answers

njuffa

Recent Activity

Donate For Us

CUDA assembly instructions

Tags:

assembly

cuda

Vitality

1 Answers

njuffa

Related questions

Recent Activity

Donate For Us