Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreaded iteration over groups for Julia GroupedDataFrame

I have a GroupedDataFrame in Julia 1.4 (DataFrames 0.22.1). I want to iterate over the groups of rows to compute some statistics. Because there are many groups and the computations are slow, I want to do this multithreaded.

The code

grouped_rows = groupby(data, by_index)
for group in grouped_rows
    # do something with `group`
end

works, but

grouped_rows = groupby(data, by_index)
Threads.@threads for group in grouped_rows
    # do something with `group`
end

results in MethodError: no method matching firstindex(::GroupedDataFrame{DataFrame}). Is there a way to parallelize the iteration over groups of DataFrame rows?

like image 571
Miklós Koren Avatar asked Oct 27 '25 10:10

Miklós Koren


1 Answers

You need to have an AbstractVector for Threads.@threads to work.

Hence collect your grouped_rows

Threads.@threads for group in collect(SubDataFrame, grouped_rows)
    # do something with `group`
end
like image 57
Przemyslaw Szufel Avatar answered Oct 30 '25 12:10

Przemyslaw Szufel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!