My dataset is large, containing many observations (Dependent variable = DV) on individuals (Name) across set periods (Period) of a testing session. A small example of my dataset is as follows:
ExampleData <- data.frame(Name = c("Tom","Tom","Tom","Tom","Tom","Tom","Tom","Tom", "Tom", "Tom",
"Ben","Ben","Ben","Ben","Ben","Ben","Ben","Ben", "Ben", "Ben"),
Period = c(0,0,1,1,1,0,0,0,1,1,
0,0,0,1,1,1,0,0,1,1),
DV = runif(20, 1.5, 2.8))
When ExampleData$Period==1 an individual is undergoing an exercise test, which varies in time/ length. Breaks in between each test are represented by ExampleData$Period==0. To avoid manually entering when a person is undergoing a test and adding the sequential periods in, I wish to include a column that declares when a group of 1's, seperated by a group of 0's, is a new period - across each person's data. How do I please go about doing this?
My anticipated output would be:
ExampleData$Descriptor <- c(NA,NA,"Period One", "Period One","Period One",NA,NA,NA,"Period Two","Period Two",
NA,NA,NA,"Period One","Period One","Period One",NA,NA,"Period Two","Period Two")
My question is similar to another of mine, located here, although I now have multiple entries for each individual. I have tried the dplyr syntax of:
Test_df <- ExampleData %>%
mutate(
Descriptor = case_when(
Period > 0 ~ "Period",
Period == 0 ~ "Rest"),
rleid = cumsum(Descriptor != lag(Descriptor, 1, default = "NA")),
Descriptor = case_when(
Descriptor == "Period" ~ paste0(Descriptor, rleid %/% 2),
TRUE ~ "Rest"),
rleid = NULL
)
Although, how do I account for each different Name/ individual in my dataset?
Thank you.
Here's an alternative approach with dplyr
library(dplyr)
ExampleData %>%
group_by(Name) %>%
mutate(Descriptor = with(rle(Period == 1),
rep(replace(paste("Period", cumsum(values)), !values, NA), lengths)))
# # A tibble: 20 x 4
# # Groups: Name [2]
# Name Period DV Descriptor
# <fctr> <dbl> <dbl> <chr>
# 1 Tom 0 2.641044 <NA>
# 2 Tom 0 2.692745 <NA>
# 3 Tom 1 1.515797 Period 1
# 4 Tom 1 2.601471 Period 1
# 5 Tom 1 1.669399 Period 1
# 6 Tom 0 2.700371 <NA>
# 7 Tom 0 1.993971 <NA>
# 8 Tom 0 2.203379 <NA>
# 9 Tom 1 2.488742 Period 2
# 10 Tom 1 1.596458 Period 2
# 11 Ben 0 2.578924 <NA>
# 12 Ben 0 1.916804 <NA>
# 13 Ben 0 2.676466 <NA>
# 14 Ben 1 2.508759 Period 1
# 15 Ben 1 2.447217 Period 1
# 16 Ben 1 2.728756 Period 1
# 17 Ben 0 2.326854 <NA>
# 18 Ben 0 1.748016 <NA>
# 19 Ben 1 1.703044 Period 2
# 20 Ben 1 1.783434 Period 2
Here is an option using data.table
library(data.table)
setDT(ExampleData)[ , grp := rleid(Period == 1), .(Name)][Period == 1,
Descriptor := paste("Period", match(grp, unique(grp))), Name][, grp := NULL][]
# Name Period DV Descriptor
# 1: Tom 0 2.764916 NA
# 2: Tom 0 1.537837 NA
# 3: Tom 1 1.848110 Period 1
# 4: Tom 1 2.621724 Period 1
# 5: Tom 1 2.206875 Period 1
# 6: Tom 0 1.715299 NA
# 7: Tom 0 1.882378 NA
# 8: Tom 0 2.244155 NA
# 9: Tom 1 2.094944 Period 2
#10: Tom 1 1.713493 Period 2
#11: Ben 0 1.794261 NA
#12: Ben 0 1.608199 NA
#13: Ben 0 2.053490 NA
#14: Ben 1 1.791563 Period 1
#15: Ben 1 1.652090 Period 1
#16: Ben 1 2.510483 Period 1
#17: Ben 0 2.345984 NA
#18: Ben 0 2.754110 NA
#19: Ben 1 1.675527 Period 2
#20: Ben 1 1.709622 Period 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With