Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract date after string in R

Tags:

regex

r

tidyr

I'm trying to extract dates from a Notes column using tidyr's extract function. The data I'm working on looks like this:

dates <- data.frame(col1 = c("customer", "customer2", "customer3"),
                    Notes = c("DOB: 12/10/62
START: 09/01/2019
END: 09/01/2020", "
S/DATE: 28/08/19
R/DATE: 27/08/20", "DOB: 13/01/1980
Start:04/12/2018"),
                    End_date = NA,
                    Start_Date = NA )

I tried extracting the date following the string "S/DATE" like this:

extract <- extract(
  dates,
  col = "Notes",
  into = "Start_date",
  regex = "(?<=(S\\/DATE:)).*"  # Using regex lookahead
)

However, this only extracts the string "S/DATE:", not the date after it. When I tried this on regex101.com, it works as expected.

Thanks. Ibrahim

like image 416
Ibrahim Avatar asked Dec 13 '25 12:12

Ibrahim


1 Answers

You could use sub here for a base R option:

s_date <- ifelse(grepl("S/DATE", dates$Notes),
                 sub("^.*\\bS/DATE: (\\S+).*$", "\\1", dates$Notes), NA)
s_date

[1] NA         "28/08/19" NA

Note that the call to grepl above is needed here, because sub by default will return the entire input string (in this case the full Notes) in the event that S/DATE be not found in the text.

like image 109
Tim Biegeleisen Avatar answered Dec 15 '25 16:12

Tim Biegeleisen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!