My original data contains hour data for around 1 day. I want to add the overnight data 1 day; how to do it in Julia's way?
using CSV, DataFrames, Dates
path = tempname()
write(path, "t\n9\n12\n18\n0\n5\n")
df = CSV.read(path, DataFrame, types=Dict(:t=>DateTime), dateformat="H")
df output
5×1 DataFrame
Row │ t
│ DateTime
─────┼─────────────────────
1 │ 0001-01-01T09:00:00
2 │ 0001-01-01T12:00:00
3 │ 0001-01-01T18:00:00
4 │ 0001-01-01T00:00:00 <- add Day(1) expected be 0001-01-02
5 │ 0001-01-01T05:00:00 <- add Day(1) expected be 0001-01-02
Let's say your data is
julia> write(path, "t\n9\n12\n18\n0\n5\n21\n4")
18
julia> df = CSV.read(path, DataFrame, types=Dict(:t=>DateTime), dateformat="H")
7×1 DataFrame
Row │ t
│ DateTime
─────┼─────────────────────
1 │ 0001-01-01T09:00:00
2 │ 0001-01-01T12:00:00
3 │ 0001-01-01T18:00:00
4 │ 0001-01-01T00:00:00
5 │ 0001-01-01T05:00:00
6 │ 0001-01-01T21:00:00
7 │ 0001-01-01T04:00:00
(spans three days i.e. one more day than the data in the question)
Every time the hour is lesser than the previous hour value, we'll assume it's the data for a new day. To detect this, we can use sign.(diff(df.t))
julia> sign.(diff(df.t))
6-element Vector{Int64}:
1
1
-1
1
1
-1
We can check for the places where these values are less than 0 (i.e. -1). Those signify the date transitions in this data. Appying cumsum on that result will give us the date increments to be applied to each row.
julia> sign.(diff(df.t)) .< 0
6-element BitVector:
0
0
1
0
0
1
julia> dateincrements = cumsum(sign.(diff(df.t)) .< 0)
6-element Vector{Int64}:
0
0
1
1
1
2
We can now convert these to the Day datatype and add them to the t column. (The first row is the basis for the comparison and is assumed to have the date we want to start at, so the addition is done to the rest of the rows.)
julia> df.t[2:end] .+= Day.(dateincrements);
julia> df
7×1 DataFrame
Row │ t
│ DateTime
─────┼─────────────────────
1 │ 0001-01-01T09:00:00
2 │ 0001-01-01T12:00:00
3 │ 0001-01-01T18:00:00
4 │ 0001-01-02T00:00:00
5 │ 0001-01-02T05:00:00
6 │ 0001-01-02T21:00:00
7 │ 0001-01-03T04:00:00
You could do:
replace!(x -> x+Day(1), @view df.t[4:5]);
Here is the result
5×1 DataFrame
Row │ t
│ DateTime
─────┼─────────────────────
1 │ 0001-01-01T09:00:00
2 │ 0001-01-01T12:00:00
3 │ 0001-01-01T18:00:00
4 │ 0001-01-02T00:00:00
5 │ 0001-01-02T05:00:00
If you want rather to add a day when the hour is lower than some threshold than replace!(x -> hour(x) <= 6 ? x+Day(1) : x, df.t);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With