I'd like to know how to get the date difference of two column in data.table using the lapply in data.table?
library(data.table)
dt <- fread(" ID Date ME_Mes DOB
A 2017-02-20 0.0000 2016-08-19
B 2017-02-06 2.3030 2016-03-11
C 2017-03-20 0.4135 2016-08-19
D 2017-03-06 0.0480 2016-10-09
E 2017-04-20 2.4445 2016-05-04")
> dt
ID Date ME_Mes DOB
1: A 2017-02-20 0.0000 2016-08-19
2: B 2017-02-06 2.3030 2016-03-11
3: C 2017-03-20 0.4135 2016-08-19
4: D 2017-03-06 0.0480 2016-10-09
5: E 2017-04-20 2.4445 2016-05-04
###I'd like to calculate the difference in weeks for every ID by comparing the DOB-Date.
I tired the following:
dt[,lapply(.SD, diff.Date), .SDcols = c(4,2), ID] # but did not work!
You can use difftime to get the difference in weeks. Although, you need to convert your columns to POSIXct.
In case you want to keep class of your columns as they are, this works:
dt[, "DOB_Date" := difftime(strptime(dt$Date, format = "%Y-%m-%d"),
strptime(dt$DOB, format = "%Y-%m-%d"), units = "weeks")]
dt
## ID Date ME_Mes DOB DOB_Date
## 1: A 2017-02-20 0.0000 2016-08-19 26.43452 weeks
## 2: B 2017-02-06 2.3030 2016-03-11 47.42857 weeks
## 3: C 2017-03-20 0.4135 2016-08-19 30.42857 weeks
## 4: D 2017-03-06 0.0480 2016-10-09 21.14881 weeks
## 5: E 2017-04-20 2.4445 2016-05-04 50.14286 weeks
However, as @Frank suggested it's better to convert ("overwrite") your date-columns to POSIXct class first.
My hunch (and I will let others correct me) is that the following is faster on large datasets:
dt[,Date:=as.Date(Date)]
dt[,DOB:=as.Date(DOB)]
dt[,datediff:=as.integer(Date)-as.integer(DOB)]
datediff will contain date differences in days.
If you have a truly large data.table, you may consider fastPOSIXct from fasttime for string conversion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With