This question is about how to generate frequency transition tables from longitudinal data in the long format using R base functions or commonly used packages such as dplyr. Consider the longitudinal data
id <- c(1,1,2,2,3,3,4,4)
state <- c("C","A", "A", "A", "B", "A", "C", "A")
period <- rep(c("Start", "End"), 4)
df <- data.frame(id, state, period)
df
id state period
1 1 C Start
2 1 A End
3 2 A Start
4 2 A End
5 3 B Start
6 3 A End
7 4 C Start
8 4 A End
and the expected output
transition freq
1 A to A 1
2 A to B 0
3 A to C 0
4 B to B 0
5 B to A 1
6 B to C 0
7 C to C 0
8 C to A 2
9 C to B 0
I can generate the above output using the function statetable.msm in the msm package. However, I would like to know if this could be generated by base functions in R or other packages such as dplyr. Help is much appreciated!
A solution entirely within base R could be something like:
do.call("c",
split(df$state, df$id) |>
lapply(paste, collapse = " to ")) |>
factor(levels = sort(c(outer(unique(df$state), unique(df$state),
FUN = paste, sep = " to ")))) |>
table() |>
as.data.frame() |>
setNames(c("transition", "freq"))
#> transition freq
#> 1 A to A 1
#> 2 A to B 0
#> 3 A to C 0
#> 4 B to A 1
#> 5 B to B 0
#> 6 B to C 0
#> 7 C to A 2
#> 8 C to B 0
#> 9 C to C 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With