Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the number of transitions

Tags:

r

I have a data set like below. Each patient has 3 visits and they can transition between the 3 states from visit to visit.

ID <- c(1,1,1,2,2,2,3,3,3)
Visit <- c(1,2,3,1,2,3,1,2,3)
State <- c(2,1,1,3,2,1,2,3,1)

I want to make a data frame that count the number of transitions of states from visit 1 to visit 2. For Visit 1 to Visit 2, the matrix will be like: (the rows represent the state at visit 1, and the columns represent the state at visit 2. Entries on the diagonals represent counts of participants who did not transition) enter image description here

like image 405
Jenn0804 Avatar asked Dec 12 '25 00:12

Jenn0804


1 Answers

Although there is no harm in using other packages, this can be easily done using only table on base R (plus a minor step if the data is incomplete).

Preliminary steps

You probably have your data in a data.frame, so we'll build one from your sample data. I'll also make slight adjustments to the variables (IDs as letters, visits as "V1", "V2", etc.), for readability.

ddff <- data.frame(
  ID = rep(c("A", "B", "C"), each = 3),
  Visit = rep(c("V1", "V2", "V3"), 3),
  State = paste0("S", c(2, 1, 1, 3, 2, 1, 2, 3, 1)))

Scenario 1: complete dataset

If the dataset is complete, or if the missing values are explicit (i.e. if there is an explicit entry for each visit of each patient, even if the State is an NA), then it's a simple table is sufficient. We just need to turn State into a factor first, to make sure it isn't dropped, and we need to order the data.frame

ddff$State <- factor(ddff$State)
ddff <- ddff[order(ddff$ID, ddff$Visit), ]

table(ddff$State[ddff$Visit == "V1"],
      ddff$State[ddff$Visit == "V2"],
      dnn = c("V1", "V2"))
    V2
V1   S1 S2 S3
  S1  0  0  0
  S2  1  0  1
  S3  0  1  0

There will be non-zero values in the diagonal if any patients don't change state. E.g. for Visit 3 vs Visit 2:

table(ddff$State[ddff$Visit == "V2"],
      ddff$State[ddff$Visit == "V3"],
      dnn = c("V2", "V3"))
    V3
V2   S1 S2 S3
  S1  1  0  0
  S2  1  0  0
  S3  1  0  0

But if you really don't want them, you easily assign zeros to the diagonal:

tt <- table(ddff$State[ddff$Visit == "V2"],
            ddff$State[ddff$Visit == "V3"],
            dnn = c("V2", "V3"))
diag(tt) <- 0
tt
    V3
V2   S1 S2 S3
  S1  0  0  0
  S2  1  0  0
  S3  1  0  0

Scenario 2: implicit missing data

If there are missing values on the dataset, i.e. if there is not a line for each visit of each patient, the same approach can be used, but we need to fill in the missing data points by joining the data.frame with a combination of all possible IDs and visits.

First we'll drop V2 for patient B, to create an incomplete data.frame:

ddff2 <- ddff[-5, ]
ddff2
  ID Visit State
1  A    V1    S2
2  A    V2    S1
3  A    V3    S1
4  B    V1    S3
5  B    V3    S1
6  C    V1    S2
7  C    V2    S3
8  C    V3    S1

Then we use expand.grid to create a data.frame with all possible combinations of ID and Visit, and then use merge to cross it with our data set. This will turn the implicit missing values into explicit missing values:

ddff2 <- merge(
  ddff2,
  expand.grid(ID = unique(ddff2$ID), Visit = unique(ddff2$Visit)),
  all.y = T)
ddff2
  ID Visit State
1  A    V1    S2
2  A    V2    S1
3  A    V3    S1
4  B    V1    S3
5  B    V2  <NA>
6  B    V3    S1
7  C    V1    S2
8  C    V2    S3
9  C    V3    S1

We can now use the same approach as earlier:

table(ddff2$State[ddff2$Visit == "V1"],
      ddff2$State[ddff2$Visit == "V2"],
      dnn = c("V1", "V2"))
    V2
V1   S1 S2 S3
  S1  0  0  0
  S2  1  0  1
  S3  0  0  0
like image 143
Zé Loff Avatar answered Dec 13 '25 14:12

Zé Loff



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!