I'm relatively new to Julia - I wondered how to select some columns in DataFrames.jl, based on condition, e.q., all columns with an average greater than 0.
One way to select columns based on a column-wise condition is to map that condition on the columns using eachcol, then use the resulting Bool array as a column selector on the DataFrame:
julia> using DataFrames, Statistics
julia> df = DataFrame(a=randn(10), b=randn(10) .- 1, c=randn(10) .+ 1, d=randn(10))
10×4 DataFrame
 Row │ a          b            c           d          
     │ Float64    Float64      Float64     Float64    
─────┼────────────────────────────────────────────────
   1 │ -1.05612   -2.01901      1.99614    -2.08048
   2 │ -0.37359    0.00750529   2.11529     1.93699
   3 │ -1.15199   -0.812506    -0.721653   -0.286076
   4 │  0.992366  -2.05898      0.474682   -0.210283
   5 │  0.206846  -0.922274     1.87723    -0.403679
   6 │ -1.01923   -1.4401      -0.0769749   0.0557395
   7 │  1.99409   -0.463743     1.83163    -0.585677
   8 │  2.21445    0.658119     2.33056    -1.01474
   9 │  0.918917  -0.371214     1.76301    -0.234561
  10 │ -0.839345  -1.09017      1.38716    -2.82545
julia> f(x) = mean(x) > 0
f (generic function with 1 method)
julia> df[:, map(f, eachcol(df))]
10×2 DataFrame
 Row │ a          c          
     │ Float64    Float64    
─────┼───────────────────────
   1 │ -1.05612    1.99614
   2 │ -0.37359    2.11529
   3 │ -1.15199   -0.721653
   4 │  0.992366   0.474682
   5 │  0.206846   1.87723
   6 │ -1.01923   -0.0769749
   7 │  1.99409    1.83163
   8 │  2.21445    2.33056
   9 │  0.918917   1.76301
  10 │ -0.839345   1.38716
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With