Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia CUDA - Reduce matrix columns

Tags:

cuda

gpu

julia

Consider the following kernel, which reduces along the rows of a 2-D matrix

function row_sum!(x, ncol, out)
    """out = sum(x, dims=2)"""
    row_idx = (blockIdx().x-1) * blockDim().x + threadIdx().x
    for i = 1:ncol
        @inbounds out[row_idx] += x[row_idx, i]
    end
    return
end

N = 1024
x = CUDA.rand(Float64, N, 2*N)
out = CUDA.zeros(Float64, N)
@cuda threads=256 blocks=4 row_sum!(x, size(x)[2], out)
isapprox(out, sum(x, dims=2))  # true

How do I write a similar kernel except for reducing along the columns (of a 2-D matrix)? In particular, how do I get the index of each column, similar to how we got the index of each row with row_idx?

like image 433
A is for Ambition Avatar asked Sep 05 '25 03:09

A is for Ambition


1 Answers

Here is the code:

function col_sum!(x, nrow, out)
    """out = sum(x, dims=1)"""
    col_idx = (blockIdx().x-1) * blockDim().x + threadIdx().x
    for i = 1:nrow
        @inbounds out[col_idx] += x[i, col_idx]
    end
    return
end

N = 1024
x = CUDA.rand(Float64, N, 2N)
out = CUDA.zeros(Float64, 2N)
@cuda threads=256 blocks=8 col_sum!(x, size(x, 1), out)

And here is the test:

julia> isapprox(out, vec(sum(x, dims=1)))
true

As you can see the size of the result vector is now 2N instead of N, hence we had to adapt the number of blocks accordingly (that is multiply by 2 and now we have 8 instead of 4)

More materials can be found here: https://juliagpu.gitlab.io/CUDA.jl/tutorials/introduction/

like image 196
Przemyslaw Szufel Avatar answered Sep 07 '25 22:09

Przemyslaw Szufel