I have a data set that looks like this:
shop_id,item_id,time,value
150,1,2015-07-10,3
150,1,2015-07-11,5
150,1,2015-07-13,2
150,2,2015-07-10,15
150,2,2015-07-12,12
Within each group, defined by "shop_id and "item_id", there are missing dates.
I wish to expand this irregular the time series to a regular, with consecutive dates, within each group:
shop_id,item_id,time,value
150,1,2015-07-10,3
150,1,2015-07-11,5
150,1,2015-07-12,0 # <~~ added
150,1,2015-07-13,2
150,2,2015-07-10,15
150,2,2015-07-11,0 # <~~ added
150,2,2015-07-12,12
For the dates which are added, the corresponding values should by zero. I've read very similar questions though (either using R or SQL coalescing), but most of the solutions I've seen doesn't involve GROUP BYs.
Basically I have access to the SQL database/I can export as CSV for manipulation preferably in C#. Was hoping to find C# libraries that can do such data manipulation but couldn't find any.
Any advice or help is appreciated!
You can use data.table from R. Assuming that 'time' column is of 'Date' class,
library(data.table)#v1.9.5+
DT1 <- setDT(df1)[, list(time=seq(min(time), max(time), by ='day')),
by =.(shop_id, item_id)]
setkeyv(df1, names(df1)[1:3])[DT1][is.na(value), value:=0]
# shop_id item_id time value
#1: 150 1 2015-07-10 3
#2: 150 1 2015-07-11 5
#3: 150 1 2015-07-12 0
#4: 150 1 2015-07-13 2
#5: 150 2 2015-07-10 15
#6: 150 2 2015-07-11 0
#7: 150 2 2015-07-12 12
In the devel version, you can also do this without setting the 'key'. Instructions to install the devel version are here
df1[DT1, on =c('shop_id', 'item_id', 'time')][is.na(value), value:=0]
# shop_id item_id time value
#1: 150 1 2015-07-10 3
#2: 150 1 2015-07-11 5
#3: 150 1 2015-07-12 0
#4: 150 1 2015-07-13 2
#5: 150 2 2015-07-10 15
#6: 150 2 2015-07-11 0
#7: 150 2 2015-07-12 12
Or as @Arun suggested, a more efficient option would be
DT1[, value := 0L][df1, value := i.value, on = c('shop_id', 'item_id', 'time')]
DT1
This is a Sql based solution
First you need a dates table
Date table query. Note this will create a physical table in your database.
;with cte as
(
select cast('2000-01-01' as datetime) as Dates -- Start date
union all
select dateadd(MM,1,Dates)
from cte
where Dates < '2099-12-01' -- End date
)
select *
INTO Date_table
from CTE
Then you need to left outer join your table with Date_table to get the missing dates.
SELECT A.shop_id,
A.item_id,
DT.dates,
Isnull(Y.value, 0)
FROM date_table DT
CROSS JOIN(SELECT DISTINCT shop_id,
item_id
FROM yourtable) A
LEFT OUTER JOIN yourtable Y
ON t.[time] = DT.dates
AND A.shop_id = Y.shop_id
AND A.item_id = Y.item_id
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With