Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to mutate a complex variable involving dates?

Tags:

r

dplyr

lubridate

I have a tibble in which each row represents an image of an eye and contains the following relevant variables: patientId, laterality (left or right), date, imageId.

I would like to manipulate this to create another tibble showing the number of followUpYears for each eye (patientId, laterality). followUpYears is defined in a somewhat unusual way:

  1. In order to meet the requirements for follow-up in a particular year, there must be two different imaging dates during that year i.e. between days 0-365 for year 1, days 366-730 for year 2 etc. The first image date is always the baseline and followUpYears is always an integer.
  2. Only one image per date is considered.
  3. Follow-up ceases as soon as the requirement for 2 imaging dates in a year is not met i.e. if there is only 1 imaging date in the first year, followUpYears is 0 regardless of how many images are taken subsequently.
  4. There is no requirement for there to be at least n years between the first and last image date for an eye to have n followUpYears.

The following dummy data demonstrates these points:

data <- tibble(patientId = c('A','A','A','A','A','A','B','B','B','B','B','B','B'),
               laterality = c('L','L','L','L','L','L','R','R','R','R','L','L','L'),
               date = as.Date(c('2000-05-05','2000-05-05','2001-05-06','2001-05-07','2002-05-06','2002-05-07','2000-09-08','2001-09-07','2001-09-09','2001-09-10','2000-09-08','2001-09-07','2001-09-10')),
               imageId = 1:13)

expected_output <- tibble(patientId = c('A','B','B'),
                 laterality = c('L','R','L'),
                 followUpYears = c(0, 2, 1)) 

Patient A's left eye has 0 followUpYears because of points 2 and 3. Patient B's right eye has 2 followUpYears because of point 4 (despite the fact that there is only slightly more than 1 year between the first and last image date). Patient B's left eye only has 1 year of follow up since it doesn't meet the requirement for 2 image dates in year 2.

I am familiar with the basic dplyr verbs but I can't think of how to frame this type of variable. Note that patients might have one or both eyes included and some might have 10+ years of follow up. Finally, a solution that considers 1 year to be 365 days regardless of leap years is fine.

Thank you!

like image 260
Mark Avatar asked Oct 18 '25 00:10

Mark


1 Answers

Here's a way with ifelse. diff_year is a helper function that computes the difference between two dates in year rounded to the value above.

library(dplyr)
diff_year <- function(date1, date2) ceiling(as.numeric(difftime(date1, date2)) / 365)
data %>% 
  group_by(patientId) %>% 
  summarise(followUpYears = ifelse(diff_year(date[date != first(date)][1], first(date)) <= 1,
                                   diff_year(max(date), min(date)), 0))

#A tibble: 2 × 2
#  patientId followUpYears
#  <chr>             <dbl>
#1 A                     0
#2 B                     2

Update with OP's comment. This should work with all conditions:

diff_year <- function(date1, date2) as.numeric((date1 - date2) / 365)
data %>%
  distinct(patientId, laterality, date, .keep_all = TRUE) %>% 
  group_by(patientId, laterality) %>% 
  mutate(diffYear = floor(diff_year(date, min(date)))) %>%
  add_count(count = diffYear) %>% 
  filter(!cumany(lag(n == 1, default = 0)) | row_number() == 1) %>% 
  summarise(followUpYears = ifelse(any(n > 1), ceiling(diff_year(max(date[n != 1]), min(date))), 0))


#  patientId laterality followUpYears
#1 A         L                      0
#2 B         L                      1
#3 B         R                      2
like image 121
Maël Avatar answered Oct 19 '25 16:10

Maël