Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

process overnight time data dataframe in Julia

Tags:

julia

My original data contains hour data for around 1 day. I want to add the overnight data 1 day; how to do it in Julia's way?

using CSV, DataFrames, Dates
path = tempname()
write(path, "t\n9\n12\n18\n0\n5\n")
df = CSV.read(path, DataFrame, types=Dict(:t=>DateTime), dateformat="H")

df output

5×1 DataFrame
 Row │ t
     │ DateTime
─────┼─────────────────────
   1 │ 0001-01-01T09:00:00
   2 │ 0001-01-01T12:00:00
   3 │ 0001-01-01T18:00:00
   4 │ 0001-01-01T00:00:00   <- add Day(1) expected be 0001-01-02
   5 │ 0001-01-01T05:00:00   <- add Day(1) expected be 0001-01-02
like image 609
Daniel YC Lin Avatar asked Mar 08 '26 22:03

Daniel YC Lin


2 Answers

Let's say your data is

julia> write(path, "t\n9\n12\n18\n0\n5\n21\n4")
18

julia> df = CSV.read(path, DataFrame, types=Dict(:t=>DateTime), dateformat="H")
7×1 DataFrame
 Row │ t                   
     │ DateTime            
─────┼─────────────────────
   1 │ 0001-01-01T09:00:00
   2 │ 0001-01-01T12:00:00
   3 │ 0001-01-01T18:00:00
   4 │ 0001-01-01T00:00:00
   5 │ 0001-01-01T05:00:00
   6 │ 0001-01-01T21:00:00
   7 │ 0001-01-01T04:00:00

(spans three days i.e. one more day than the data in the question)

Every time the hour is lesser than the previous hour value, we'll assume it's the data for a new day. To detect this, we can use sign.(diff(df.t))

julia> sign.(diff(df.t))
6-element Vector{Int64}:
  1
  1
 -1
  1
  1
 -1

We can check for the places where these values are less than 0 (i.e. -1). Those signify the date transitions in this data. Appying cumsum on that result will give us the date increments to be applied to each row.

julia> sign.(diff(df.t)) .< 0
6-element BitVector:
 0
 0
 1
 0
 0
 1

julia> dateincrements = cumsum(sign.(diff(df.t)) .< 0)
6-element Vector{Int64}:
 0
 0
 1
 1
 1
 2

We can now convert these to the Day datatype and add them to the t column. (The first row is the basis for the comparison and is assumed to have the date we want to start at, so the addition is done to the rest of the rows.)

julia> df.t[2:end] .+= Day.(dateincrements);

julia> df
7×1 DataFrame
 Row │ t                   
     │ DateTime            
─────┼─────────────────────
   1 │ 0001-01-01T09:00:00
   2 │ 0001-01-01T12:00:00
   3 │ 0001-01-01T18:00:00
   4 │ 0001-01-02T00:00:00
   5 │ 0001-01-02T05:00:00
   6 │ 0001-01-02T21:00:00
   7 │ 0001-01-03T04:00:00
like image 182
Sundar R Avatar answered Mar 10 '26 11:03

Sundar R


You could do:

replace!(x -> x+Day(1), @view df.t[4:5]);

Here is the result

5×1 DataFrame
 Row │ t
     │ DateTime
─────┼─────────────────────
   1 │ 0001-01-01T09:00:00
   2 │ 0001-01-01T12:00:00
   3 │ 0001-01-01T18:00:00
   4 │ 0001-01-02T00:00:00
   5 │ 0001-01-02T05:00:00

If you want rather to add a day when the hour is lower than some threshold than replace!(x -> hour(x) <= 6 ? x+Day(1) : x, df.t);

like image 26
Przemyslaw Szufel Avatar answered Mar 10 '26 10:03

Przemyslaw Szufel