Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PTX variable length buffer in shared memory

I am trying to implement a global reduction kernel in PTX which uses shared memory for reduction within a thread block (like all the CUDA C examples out there). In CUDA C on has the possibility to define an variable length array in shared memory with

extern __shared__ float sdata[];

How can I get the equivalent in PTX ?

What doesn't seem appropriate is a fixed length array like

.shared .f32 sdata[ LENGTH ];

Since I want the kernel to be reusable for different input array lengths.

What I could do is define one variable

.shared .f32 sdata;

and use it as the base address of the array. In the hope that it is allocated at the beginning of shared memory. I could then access array element like

ld.shared.f32 %r4,[sdata + <offset>]

Also this looks a bit funny because sdata is defined as a float. But what it really is is the address of a float. In this sense the above line is indeed correct. However I am not sure if this is guaranteed to be correct, say as long as the offset is not greater than the shared memory size specified at kernel launch.

The PTX manual doesn't explain variable length buffers in shared memory.

Anyone knows how to implement a variable length buffer in PTX?

like image 743
ritter Avatar asked Apr 24 '26 13:04

ritter


1 Answers

This works. However it is not the perfect solution because it introduces an extern linkage variable.

.version 2.3
.target sm_20
.extern .shared .align 4 .b8 sdata[];
.entry func (.param .s32 param0,...)
{
 //
 // Base addresses
 mov.u64 w2,sdata;  // shared memory
 ld.shared.s32 i9,[w2+0];
}
like image 105
ritter Avatar answered Apr 26 '26 03:04

ritter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!