I'm a new user of R and I'm a little bit stuck, my data looks like this:
dates temp
01/31/2011 40
01/30/2011 34
01/29/2011 30
01/28/2011 52
01/27/2011 39
01/26/2011 37
...
01/01/2011 31
i want take only temp under 40 degrees and with the dates of beginning and the end and how many days it lasts, for example:
from to days
01/29/2011 01/30/2011 2
01/26/2011 01/27/2011 2
I tried with difftime but it didn't work, maybe with a function it will.
any help would be appreciated.
I'd do something like this. I'll use data.table here.
df <- read.table(header=TRUE, text="dates temp
01/31/2011 40
01/30/2011 34
01/29/2011 30
01/28/2011 52
01/27/2011 39
01/26/2011 37", stringsAsFactors=FALSE)
require(data.table)
dt <- data.table(df)
dt <- dt[, `:=`(date.form = as.Date(dates, format="%m/%d/%Y"),
id = cumsum(as.numeric(temp >= 40)))][temp < 40]
dt[, list(from=min(date.form), to=max(date.form), count=.N), by=id]
# id from to count
# 1: 1 2011-01-29 2011-01-30 2
# 2: 2 2011-01-26 2011-01-27 2
The idea is to first create a column with the dates column converted to Date format first. Then, another column id that finds the positions where temp >= 40 and uses that to create the group of values that are within two temp>=40. That is, if you have c(40, 34, 30, 52, 39, 37), then you'd want c(1,1,1,2,2,2). That is, everything between to values >= 40, must belong to the same group (34, 30 -> 1 and 39, 37 -> 2). After doing this, I'd remove temp >= 40 entries.
then, you can split by this group and then take min and max and length(.) (which is by default stored in .N).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With