I'm trying to extract dates from a Notes column using tidyr's extract function. The data I'm working on looks like this:
dates <- data.frame(col1 = c("customer", "customer2", "customer3"),
Notes = c("DOB: 12/10/62
START: 09/01/2019
END: 09/01/2020", "
S/DATE: 28/08/19
R/DATE: 27/08/20", "DOB: 13/01/1980
Start:04/12/2018"),
End_date = NA,
Start_Date = NA )
I tried extracting the date following the string "S/DATE" like this:
extract <- extract(
dates,
col = "Notes",
into = "Start_date",
regex = "(?<=(S\\/DATE:)).*" # Using regex lookahead
)
However, this only extracts the string "S/DATE:", not the date after it. When I tried this on regex101.com, it works as expected.
Thanks. Ibrahim
You could use sub here for a base R option:
s_date <- ifelse(grepl("S/DATE", dates$Notes),
sub("^.*\\bS/DATE: (\\S+).*$", "\\1", dates$Notes), NA)
s_date
[1] NA "28/08/19" NA
Note that the call to grepl above is needed here, because sub by default will return the entire input string (in this case the full Notes) in the event that S/DATE be not found in the text.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With