Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia Dataframes - concisely create column with eltype Union{Missing, T}

I'm building a dataframe where for some of the columns, the obvious way to create them involves a multi-step process. I'd like to idiomatically and concisely create a column with eltype Union{Missing, T}. Then I can then fill the column using the multi-step process (and disallowmissing once finished as appropriate). What's the cleanest way to do this?

I'd like to do something like df[!, :col] :: Vector{Union{Int64, Missing}} .= missing but this gives "ArgumentError: column name :col not found in the data frame; ..."

If I try to do df[!, :col] .= fill(missing, nrow(df)) :: Vector{Union{Int64, Missing}}, I get "TypeError: in typeassert, expected Vector{Union{Missing, Int64}}, got a value of type Vector{Missing}".

For the moment I'm doing something ugly and confusing, like

df[!, :col] .= 0

allowmissing!(df, :col)

df.col .= missing

Any suggestions? My sense is that if I have this question, I don't really understand the nuances of how column typing in DataFrames.jl works, even though I use it all the time and generally don't have problems. I've searched the documentation and don't feel like I've seen anything that would help with this specific issue, but any recommended reading would be appreciated.

Thanks!

like image 752
cpgj Avatar asked Feb 01 '26 15:02

cpgj


1 Answers

This is a way to do it (there are other options how to add a column to a data frame, but the key function to use is missings):

julia> using DataFrames

julia> df = DataFrame()
0×0 DataFrame

julia> df.col = missings(Int, 5)
5-element Vector{Union{Missing, Int64}}:
 missing
 missing
 missing
 missing
 missing

julia> df
5×1 DataFrame
 Row │ col
     │ Int64?
─────┼─────────
   1 │ missing
   2 │ missing
   3 │ missing
   4 │ missing
   5 │ missing

julia> df.other_col = missings(Float64, nrow(df))
5-element Vector{Union{Missing, Float64}}:
 missing
 missing
 missing
 missing
 missing

julia> df
5×2 DataFrame
 Row │ col      other_col
     │ Int64?   Float64?
─────┼────────────────────
   1 │ missing    missing
   2 │ missing    missing
   3 │ missing    missing
   4 │ missing    missing
   5 │ missing    missing

As a side note - this issue is unrelated with DataFrames.jl but related to how vectors are created in Julia in general. The missings function is defined in the Missings.jl package (that is re-exported by DataFrames.jl). If you wanted to use Julia Base functionality only then the following would give you the same as using missings:

julia> Vector{Union{Int, Missing}}(missing, 5)
5-element Vector{Union{Missing, Int64}}:
 missing
 missing
 missing
 missing
 missing

(however, since it is more verbose I typically use the missings function)

like image 124
Bogumił Kamiński Avatar answered Feb 04 '26 06:02

Bogumił Kamiński



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!