I have patients who have diagnoses and each patient is represented by an ID (I have three one here). Some diagnoses occur recurrently, if this is the case I would like to add the number of the recurrence in front of the diagnosis.
For example for the patient ID=2, the second occurrence of "SLE" will be rename in "SLE2" and the third occurrence would be renamed in "SLE3"
ID<-c(rep(1,10),rep(2,10),rep(3,5))
time<-c(1:10,1:10,1:5)
diag<-c("ANN",NA,NA,NA,"SLE","ANN",NA,NA,NA,NA,
"SLE",NA,NA,NA,"SLE","BPG","SLE",NA,NA,NA,"SLE",NA,NA,"ANN",NA)
mydata<-data.frame(ID,time,diag)
My new variable must be like this:
ID time diag diag2
1 1 1 ANN ANN
2 1 2 <NA> <NA>
3 1 3 <NA> <NA>
4 1 4 <NA> <NA>
5 1 5 SLE SLE
6 1 6 ANN ANN2
7 1 7 <NA> <NA>
8 1 8 <NA> <NA>
9 1 9 <NA> <NA>
10 1 10 <NA> <NA>
11 2 1 SLE SLE
12 2 2 <NA> <NA>
13 2 3 <NA> <NA>
14 2 4 <NA> <NA>
15 2 5 SLE SLE2
16 2 6 BPG BPG
17 2 7 SLE SLE3
18 2 8 <NA> <NA>
19 2 9 <NA> <NA>
20 2 10 <NA> <NA>
21 3 1 SLE SLE
22 3 2 <NA> <NA>
23 3 3 <NA> <NA>
24 3 4 ANN ANN
25 3 5 <NA> <NA>
One way would be to make use of dense_rank from dplyr package:
library(dplyr)
mydata %>%
group_by(ID, diag) %>%
mutate(rank = dense_rank(time)) %>%
mutate(diag2 = case_when(is.na(diag) ~ diag,
rank == 1 ~ diag,
TRUE ~ paste0(diag, rank))) %>%
dplyr::select(-rank)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With