I am looking for the opposite of the dropmissing function in DataFrames.jl so that the user knows where to look to fix their bad data. It seems like this should be easy, but the filter function expects a column to be specified and I cannot get it to iterate over all columns.
julia> df=DataFrame(a=[1, missing, 3], b=[4, 5, missing])
3×2 DataFrame
│ Row │ a │ b │
│ │ Int64? │ Int64? │
├─────┼─────────┼─────────┤
│ 1 │ 1 │ 4 │
│ 2 │ missing │ 5 │
│ 3 │ 3 │ missing │
julia> filter(x -> ismissing(eachcol(x)), df)
ERROR: MethodError: no method matching eachcol(::DataFrameRow{DataFrame,DataFrames.Index})
julia> filter(x -> ismissing.(x), df)
ERROR: ArgumentError: broadcasting over `DataFrameRow`s is reserved
I am basically trying to recreate the disallowmissing function, but with a more useful error message.
Here are two ways to do it:
julia> df = DataFrame(a=[1, missing, 3], b=[4, 5, missing])
3×2 DataFrame
│ Row │ a │ b │
│ │ Int64? │ Int64? │
├─────┼─────────┼─────────┤
│ 1 │ 1 │ 4 │
│ 2 │ missing │ 5 │
│ 3 │ 3 │ missing │
julia> df[.!completecases(df), :] # this will be faster
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64? │ Int64? │
├─────┼─────────┼─────────┤
│ 1 │ missing │ 5 │
│ 2 │ 3 │ missing │
julia> @view df[.!completecases(df), :]
2×2 SubDataFrame
│ Row │ a │ b │
│ │ Int64? │ Int64? │
├─────┼─────────┼─────────┤
│ 1 │ missing │ 5 │
│ 2 │ 3 │ missing │
julia> filter(row -> any(ismissing, row), df)
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64? │ Int64? │
├─────┼─────────┼─────────┤
│ 1 │ missing │ 5 │
│ 2 │ 3 │ missing │
julia> filter(row -> any(ismissing, row), df, view=true) # requires DataFrames.jl 0.22
2×2 SubDataFrame
Row │ a b
│ Int64? Int64?
─────┼──────────────────
1 │ missing 5
2 │ 3 missing
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With