Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Date intervals and data manipulation

Tags:

r

I'm a new user of R and I'm a little bit stuck, my data looks like this:

dates        temp
01/31/2011    40
01/30/2011    34
01/29/2011    30
01/28/2011    52
01/27/2011    39
01/26/2011    37
...
01/01/2011    31

i want take only temp under 40 degrees and with the dates of beginning and the end and how many days it lasts, for example:

from         to           days
01/29/2011   01/30/2011     2
01/26/2011   01/27/2011     2

I tried with difftime but it didn't work, maybe with a function it will.

any help would be appreciated.

like image 769
Marco Avatar asked Jan 29 '26 07:01

Marco


1 Answers

I'd do something like this. I'll use data.table here.

df <- read.table(header=TRUE, text="dates        temp
01/31/2011    40
01/30/2011    34
01/29/2011    30
01/28/2011    52
01/27/2011    39
01/26/2011    37", stringsAsFactors=FALSE)

require(data.table)
dt <- data.table(df)
dt <- dt[, `:=`(date.form = as.Date(dates, format="%m/%d/%Y"), 
          id = cumsum(as.numeric(temp >= 40)))][temp < 40]
dt[, list(from=min(date.form), to=max(date.form), count=.N), by=id]

#    id       from         to count
# 1:  1 2011-01-29 2011-01-30     2
# 2:  2 2011-01-26 2011-01-27     2

The idea is to first create a column with the dates column converted to Date format first. Then, another column id that finds the positions where temp >= 40 and uses that to create the group of values that are within two temp>=40. That is, if you have c(40, 34, 30, 52, 39, 37), then you'd want c(1,1,1,2,2,2). That is, everything between to values >= 40, must belong to the same group (34, 30 -> 1 and 39, 37 -> 2). After doing this, I'd remove temp >= 40 entries.

then, you can split by this group and then take min and max and length(.) (which is by default stored in .N).

like image 100
Arun Avatar answered Jan 30 '26 22:01

Arun