Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

calculating length of episodes/event using R

Tags:

r

I am just wondering if anyone would be able to advise me how should I approach about the following calculation with r please?

I have a hourly dataset for a year with 3 columns, "date" "time" and "values"

for example:

'01/01/2000'     '08:00'     '10'     
'01/01/2000'     '09:00'     '30'
'01/01/2000'     '10:00'     '43'
'01/01/2000'     '11:00'     '55'
'01/01/2000'     '12:00'     '59'
'01/01/2000'     '13:00'     '45'
'01/01/2000'     '14:00'     '10'
'01/01/2000'     '15:00'     '15'
'01/01/2000'     '16:00'     '43'
'01/01/2000'     '17:00'     '45'
'01/01/2000'     '18:00'     '60'
'01/01/2000'     '19:00'     '10'

I would like to create a data.frame that would calculate the length of episodes with values > 40, and if possible show it with the start date and time, for example from the above table 1st occurence of exceedence is at 10:00am for the duration of 4 hours and the 2nd occurance is at 16:00 for the duration of 3 hours, so I am wondering if it is possible to create a data frame as below?

     'date'      'time'    'Duration'  
'01/01/2000'     '10:00'       '4'
'01/01/2000'     '16:00'       '3'

and so on for the yearly dataset

like image 594
Achak Avatar asked Dec 19 '25 23:12

Achak


2 Answers

Here is another solution, that relies on plyr: it makes it easier to compute other quantities on each spell of values above 40, e.g., the average or the maximum.

# Sample data
k <- 3
d <- data.frame( 
  date = rep( seq.Date( Sys.Date(), length=k, by="day" ), each=24 ),
  time = sprintf( "%02d:00", rep( 0:23, k ) ),
  value = round(200*runif(24*k))
)
d$timestamp <- as.POSIXct( paste( d$date, d$time ) )
d <- d[ order( d$timestamp ), ]
# Extract the spells above 40
n <- nrow(d)
d$inside <- d$value > 40
d$start  <- ! c(FALSE, d$inside[-n]) & d$inside
d$end    <- d$inside & ! c(d$inside[-1], FALSE)  # Not used
d$group  <- cumsum(d$start)  # Number the spells
d <- d[ d$inside, ]
library(plyr)
ddply( d, "group", summarize,
  start  = min(timestamp),
  end    = max(timestamp),
  length = length(value),
  mean   = mean(value)
)

The spells of values above 40 can span several days: this may or may not be what you want.

like image 157
Vincent Zoonekynd Avatar answered Dec 21 '25 14:12

Vincent Zoonekynd


Another option:

dat <- structure(list(date = c("01/01/2000", "01/01/2000", "01/01/2000", 
  "01/01/2000", "01/01/2000", "01/01/2000", "01/01/2000", "01/01/2000", 
  "01/01/2000", "01/01/2000", "01/01/2000", "01/01/2000"), 
  time = c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", 
  "15:00", "16:00", "17:00", "18:00", "19:00"), value = c("10", "30", "43", 
  "55", "59", "45", "10", "15", "43", "45", "60", "10")), 
  .Names = c("date", "time", "values"), row.names = c(NA, -12L), 
  class = "data.frame")

run <- rle(dat$value > 40)
dat$exceeds <- rep(run$values, run$lengths)
dat$duration <- rep(run$lengths, run$lengths)
starts <- dat[head(c(1, cumsum(run$length) + 1), length(run$length)),]
result <- subset(starts, duration > 1 & exceeds)

result[, c(1, 2, 5)]

        date  time duration
3 01/01/2000 10:00        4
9 01/01/2000 16:00        3
like image 41
jbaums Avatar answered Dec 21 '25 15:12

jbaums



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!