I'm building a dataframe where for some of the columns, the obvious way to create them involves a multi-step process. I'd like to idiomatically and concisely create a column with eltype Union{Missing, T}. Then I can then fill the column using the multi-step process (and disallowmissing once finished as appropriate). What's the cleanest way to do this?
I'd like to do something like df[!, :col] :: Vector{Union{Int64, Missing}} .= missing but this gives "ArgumentError: column name :col not found in the data frame; ..."
If I try to do df[!, :col] .= fill(missing, nrow(df)) :: Vector{Union{Int64, Missing}}, I get "TypeError: in typeassert, expected Vector{Union{Missing, Int64}}, got a value of type Vector{Missing}".
For the moment I'm doing something ugly and confusing, like
df[!, :col] .= 0
allowmissing!(df, :col)
df.col .= missing
Any suggestions? My sense is that if I have this question, I don't really understand the nuances of how column typing in DataFrames.jl works, even though I use it all the time and generally don't have problems. I've searched the documentation and don't feel like I've seen anything that would help with this specific issue, but any recommended reading would be appreciated.
Thanks!
This is a way to do it (there are other options how to add a column to a data frame, but the key function to use is missings):
julia> using DataFrames
julia> df = DataFrame()
0×0 DataFrame
julia> df.col = missings(Int, 5)
5-element Vector{Union{Missing, Int64}}:
missing
missing
missing
missing
missing
julia> df
5×1 DataFrame
Row │ col
│ Int64?
─────┼─────────
1 │ missing
2 │ missing
3 │ missing
4 │ missing
5 │ missing
julia> df.other_col = missings(Float64, nrow(df))
5-element Vector{Union{Missing, Float64}}:
missing
missing
missing
missing
missing
julia> df
5×2 DataFrame
Row │ col other_col
│ Int64? Float64?
─────┼────────────────────
1 │ missing missing
2 │ missing missing
3 │ missing missing
4 │ missing missing
5 │ missing missing
As a side note - this issue is unrelated with DataFrames.jl but related to how vectors are created in Julia in general. The missings function is defined in the Missings.jl package (that is re-exported by DataFrames.jl). If you wanted to use Julia Base functionality only then the following would give you the same as using missings:
julia> Vector{Union{Int, Missing}}(missing, 5)
5-element Vector{Union{Missing, Int64}}:
missing
missing
missing
missing
missing
(however, since it is more verbose I typically use the missings function)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With