I would like to add 1 to session column when the difference between the first timestamp of the same session and followings timestamp records is more than 10 units.
in other words:
if the gap in timestamp column is more than 10 in the same session, add 1 to the rest of the sessions for a specific ID. So we shouldn't have the same session with the gap more than 10 in its records.
lets say:
df<-read.table(text="
ID timestamp session
1 10 1
1 12 1
1 15 1
1 21 1
1 25 1
1 27 2
1 29 2
2 11 1
2 22 2
2 27 2
2 32 2
2 42 2
2 43 3",header=T,stringsAsFactors = F)
In the example above, for ID==1 the session gap from the first record (timestamp==10) is more than 10 in row 4 (timestamp==21), so we add 1 to the rest of sessions. Whenever the session number change the difference of the first record of timestamp should be less than 10 in the same sassion, otherwise it should add to session.
result:
ID timestamp session
1 *10 1
1 12 1
1 15 1
1 *21 2 <-- because 21-10 >= 10 it add 1 to the rest of sessions in this ID
1 25 2
1 27 3
1 29 3
2 11 1
2 *22 2
2 27 2
2 *32 3 <-- because 32-22>= 10 it add 1 to the rest of session
2 *42 4 <-- because 42-32>=10
2 43 5
How can I do it in R?
Perhaps a custom function might help that calculates a cumulative sum and resets once threshold reached. In this case, if you provide the function with the session data, it will provide a result that will include a cumulative "offset" for session, but only in rows when the session number did not increase. This addresses the case of ID 2 timestamp 22 where the difference > 10, but session number increased from 1 to 2.
library(tidyverse)
threshold <- 10
cumsum_with_reset <- function(x, session, threshold) {
cumsum <- 0
group <- 0
result <- numeric()
for (i in seq_along(x)) {
cumsum <- cumsum + x[i]
if (cumsum >= threshold) {
if (session[i] == session[i-1]) {
group <- group + 1
}
cumsum <- 0
}
result = c(result, group)
}
return (result)
}
df %>%
group_by(ID) %>%
mutate(diff = c(0, diff(timestamp)),
cumdiff = cumsum_with_reset(diff, session, threshold),
new_session = cumdiff + session)
Function adapted from this solution.
Output
ID timestamp session diff cumdiff new_session
<int> <int> <int> <dbl> <dbl> <dbl>
1 1 10 1 0 0 1
2 1 12 1 2 0 1
3 1 15 1 3 0 1
4 1 21 1 6 1 2
5 1 25 1 4 1 2
6 1 27 2 2 1 3
7 1 29 2 2 1 3
8 2 11 1 0 0 1
9 2 22 2 11 0 2
10 2 27 2 5 0 2
11 2 32 2 5 1 3
12 2 42 2 10 2 4
13 2 43 3 1 2 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With