Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return copy of a DataFrame that contains only rows with missing data in Julia

I am looking for the opposite of the dropmissing function in DataFrames.jl so that the user knows where to look to fix their bad data. It seems like this should be easy, but the filter function expects a column to be specified and I cannot get it to iterate over all columns.

julia> df=DataFrame(a=[1, missing, 3], b=[4, 5, missing])
3×2 DataFrame
│ Row │ a       │ b       │
│     │ Int64?  │ Int64?  │
├─────┼─────────┼─────────┤
│ 1   │ 1       │ 4       │
│ 2   │ missing │ 5       │
│ 3   │ 3       │ missing │

julia> filter(x -> ismissing(eachcol(x)), df)
ERROR: MethodError: no method matching eachcol(::DataFrameRow{DataFrame,DataFrames.Index})

julia> filter(x -> ismissing.(x), df)
ERROR: ArgumentError: broadcasting over `DataFrameRow`s is reserved

I am basically trying to recreate the disallowmissing function, but with a more useful error message.

like image 584
Nathan Boyer Avatar asked Nov 19 '25 20:11

Nathan Boyer


1 Answers

Here are two ways to do it:

julia> df = DataFrame(a=[1, missing, 3], b=[4, 5, missing])
3×2 DataFrame
│ Row │ a       │ b       │
│     │ Int64?  │ Int64?  │
├─────┼─────────┼─────────┤
│ 1   │ 1       │ 4       │
│ 2   │ missing │ 5       │
│ 3   │ 3       │ missing │

julia> df[.!completecases(df), :] # this will be faster
2×2 DataFrame
│ Row │ a       │ b       │
│     │ Int64?  │ Int64?  │
├─────┼─────────┼─────────┤
│ 1   │ missing │ 5       │
│ 2   │ 3       │ missing │

julia> @view df[.!completecases(df), :]
2×2 SubDataFrame
│ Row │ a       │ b       │
│     │ Int64?  │ Int64?  │
├─────┼─────────┼─────────┤
│ 1   │ missing │ 5       │
│ 2   │ 3       │ missing │

julia> filter(row -> any(ismissing, row), df)
2×2 DataFrame
│ Row │ a       │ b       │
│     │ Int64?  │ Int64?  │
├─────┼─────────┼─────────┤
│ 1   │ missing │ 5       │
│ 2   │ 3       │ missing │

julia> filter(row -> any(ismissing, row), df, view=true) # requires DataFrames.jl 0.22
2×2 SubDataFrame
 Row │ a        b
     │ Int64?   Int64?
─────┼──────────────────
   1 │ missing        5
   2 │       3  missing
like image 145
Bogumił Kamiński Avatar answered Nov 22 '25 14:11

Bogumił Kamiński



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!