I have two sets of data:
First set:
 patient<-c("A","A","B","B","C","C","C","C")
 arrival<-c("11:00","11:00","13:00","13:00","14:00","14:00","14:00","14:00")
 lastRow<-c("","Yes","","Yes","","","","Yes")
 data1<-data.frame(patient,arrival,lastRow)
Another set of data:
 patient<-c("A","A","A","A","B","B","B","C","C","C")
 availableSlot<-c("11:15","11:35","11:45","11:55","12:55","13:55","14:00","14:00","14:10","17:00")
 data2<-data.frame(patient, availableSlot)
I want to create add a column to the first dataset such that for each last row of each patient, it shows the available slot that is closest to the arrival time:
The results would be:
  patient arrival lastRow availableSlot
       A   11:00        
       A   11:00     Yes     11:15
       B   13:00        
       B   13:00     Yes     12:55
       C   14:00        
       C   14:00        
       C   14:00        
       C   14:00     Yes     14:00
Would appreciate if anyone can tell me how I can implement this in R.
I'd use data.table, first cleaning up by converting to ITime and ignoring redundant rows:
library(data.table)
setDT(data1)[, arrival := as.ITime(as.character(arrival))]
setDT(data2)[, availableSlot := as.ITime(as.character(availableSlot))]
DT1 = unique(data1, by="patient", fromLast=TRUE)
Then you can do a "rolling join":
res = data2[DT1, on=.(patient, availableSlot = arrival), roll="nearest", 
  .(patient, availableSlot = x.availableSlot)]
#    patient availableSlot
# 1:       A      11:15:00
# 2:       B      12:55:00
# 3:       C      14:00:00
How it works
The syntax is x[i, on=, roll=, j].
on= are the merge-by columns. i, we are looking for matches in x.roll="nearest", the final column in the on= is "rolled" to its nearest match.on= columns in the original tables can be referenced with x.* and i.* prefixes.j argument should give a list of columns, and .() is an alias for list() here.Check out the package's introductory materials at http://r-datatable.com/Getting-started and type ?data.table for the docs relevant to rolling joins.
I would stop at res, but if you really want it back in your original table...
# a very nonstandard step:
data1[lastRow == "Yes", availableSlot := res$availableSlot ]
#    patient  arrival lastRow availableSlot
# 1:       A 11:00:00                  <NA>
# 2:       A 11:00:00     Yes      11:15:00
# 3:       B 13:00:00                  <NA>
# 4:       B 13:00:00     Yes      12:55:00
# 5:       C 14:00:00                  <NA>
# 6:       C 14:00:00                  <NA>
# 7:       C 14:00:00                  <NA>
# 8:       C 14:00:00     Yes      14:00:00
Now, data1 has availableSlot in a new column, similar to when you do data1$col <- val.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With