Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex pattern questions in r

Tags:

regex

r

I need to match author and time from string in R.

test = "Postedby   BeauHDon Friday November 24, 2017 @10:30PM from the cost-effective dept."

I am currently using gsub() to find the desired output.

Expected output would be:

#author
"BeauHDon"
#Month
"November"
#Date
24
#Time
22:30

I got to gsub("Postedby (.*).*", "\\1", test) but the output is

"BeauHDon Friday November 24, 2017 @10:30PM from the cost-effective dept."

Also I understand time requires more more coding after extracting 10:30.

Is it possible to add 12 if next two string is PM?

Thank you.

like image 560
Danny Yoon Avatar asked Dec 10 '25 01:12

Danny Yoon


1 Answers

We can extract using capturing as a group (assuming that the patterns are as shown in the example). Here the pattern is to match one or more non-white spaces (\\S+) followed by spaces (\\s+) from the start (^) of the string, followed by word which we capture in a group (\\w+), followed by capturing word after we skip the next word and space, then get the numbers ((\\d+)) and the time that follows the @

v1 <- scan(text=sub("^\\S+\\s+(\\w+)\\s+\\w+\\s+(\\w+)\\s+(\\d+)[^@]+@(\\S+).*",
           "\\1,\\2,\\3,\\4", test), what = "", sep=",", quiet = TRUE)

As the last entry is time, we can convert it to datetime with strptime and change the format, assign it to the last element

v1[4] <- format(strptime(v1[4],  "%I:%M %p"), "%H:%M")

If needed, set the names of the element with author, Month etc.

names(v1) <- c("#author", "#Month", "#Date", "#Time")
v1
#  #author     #Month      #Date      #Time 
#"BeauHDon" "November"       "24"    "22:30" 
like image 123
akrun Avatar answered Dec 11 '25 16:12

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!