I have a DataFrame in Julia and I want to create a new column that represents the difference between consecutive rows in a specific column. In python pandas, I would simply use df.series.diff(). Is there a Julia equivelant?
For example:
data
1
2
4
6
7
# in pandas
df['diff_data'] = df.data.diff()
data   diff_data
1        NaN 
2          1
4          2
6          2
7          1
You can use ShiftedArrays.jl like this.
Declarative style:
julia> using DataFrames, ShiftedArrays
julia> df = DataFrame(data=[1, 2, 4, 6, 7])
5×1 DataFrame
 Row │ data
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     4
   4 │     6
   5 │     7
julia> transform(df, :data => (x -> x - lag(x)) => :data_diff)
5×2 DataFrame
 Row │ data   data_diff
     │ Int64  Int64?
─────┼──────────────────
   1 │     1    missing
   2 │     2          1
   3 │     4          2
   4 │     6          2
   5 │     7          1
Imperative style (in place):
julia> df = DataFrame(data=[1, 2, 4, 6, 7])
5×1 DataFrame
 Row │ data
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     4
   4 │     6
   5 │     7
julia> df.data_diff = df.data - lag(df.data)
5-element Vector{Union{Missing, Int64}}:
  missing
 1
 2
 2
 1
julia> df
5×2 DataFrame
 Row │ data   data_diff
     │ Int64  Int64?
─────┼──────────────────
   1 │     1    missing
   2 │     2          1
   3 │     4          2
   4 │     6          2
   5 │     7          1
with diff you do not need extra packages and can do similarly the following:
julia> df.data_diff = [missing; diff(df.data)]
5-element Vector{Union{Missing, Int64}}:
  missing
 1
 2
 2
 1
(the issue is that diff is a general purpose function that does change the length of vector from n to n-1 so you have to add missing manually in front)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With