I have a tibble in which each row represents an image of an eye and contains the following relevant variables: patientId
, laterality
(left or right), date
, imageId
.
I would like to manipulate this to create another tibble showing the number of followUpYears
for each eye (patientId, laterality). followUpYears
is defined in a somewhat unusual way:
followUpYears
is always an integer.followUpYears
is 0 regardless of how many images are taken subsequently.followUpYears
.The following dummy data demonstrates these points:
data <- tibble(patientId = c('A','A','A','A','A','A','B','B','B','B','B','B','B'),
laterality = c('L','L','L','L','L','L','R','R','R','R','L','L','L'),
date = as.Date(c('2000-05-05','2000-05-05','2001-05-06','2001-05-07','2002-05-06','2002-05-07','2000-09-08','2001-09-07','2001-09-09','2001-09-10','2000-09-08','2001-09-07','2001-09-10')),
imageId = 1:13)
expected_output <- tibble(patientId = c('A','B','B'),
laterality = c('L','R','L'),
followUpYears = c(0, 2, 1))
Patient A's left eye has 0 followUpYears
because of points 2 and 3. Patient B's right eye has 2 followUpYears
because of point 4 (despite the fact that there is only slightly more than 1 year between the first and last image date). Patient B's left eye only has 1 year of follow up since it doesn't meet the requirement for 2 image dates in year 2.
I am familiar with the basic dplyr verbs but I can't think of how to frame this type of variable. Note that patients might have one or both eyes included and some might have 10+ years of follow up. Finally, a solution that considers 1 year to be 365 days regardless of leap years is fine.
Thank you!
Here's a way with ifelse
. diff_year
is a helper function that computes the difference between two dates in year rounded to the value above.
library(dplyr)
diff_year <- function(date1, date2) ceiling(as.numeric(difftime(date1, date2)) / 365)
data %>%
group_by(patientId) %>%
summarise(followUpYears = ifelse(diff_year(date[date != first(date)][1], first(date)) <= 1,
diff_year(max(date), min(date)), 0))
#A tibble: 2 × 2
# patientId followUpYears
# <chr> <dbl>
#1 A 0
#2 B 2
Update with OP's comment. This should work with all conditions:
diff_year <- function(date1, date2) as.numeric((date1 - date2) / 365)
data %>%
distinct(patientId, laterality, date, .keep_all = TRUE) %>%
group_by(patientId, laterality) %>%
mutate(diffYear = floor(diff_year(date, min(date)))) %>%
add_count(count = diffYear) %>%
filter(!cumany(lag(n == 1, default = 0)) | row_number() == 1) %>%
summarise(followUpYears = ifelse(any(n > 1), ceiling(diff_year(max(date[n != 1]), min(date))), 0))
# patientId laterality followUpYears
#1 A L 0
#2 B L 1
#3 B R 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With