I am new to regular expression and have read http://www.gastonsanchez.com/Handling_and_Processing_Strings_in_R.pdf regex documents. I know similar questions have been posted previously, but I still had a difficult time trying to figuring out my case.
I have a vector of string filenames, try to extract substring, and save as new filenames. The filenames follow the the pattern below:
\w_\w_(substring to extract)_\d_\d_Month_Date_Year_Hour_Min_Sec_(AM or PM)
For example, ABC_DG_MS-15-0452-268_206_281_12_1_2017_1_53_11_PM, ABC_RE_SP56-01_A_206_281_12_1_2017_1_52_34_AM, the substring will be MS-15-0452-268 and SP56-01_A
I used
map(strsplit(filenames, '_'),3)
but failed, because the new filenames could have _, too.
I turned to regular expression for advanced matching, and come up with this
gsub("^[^\n]+_\\d_\\d_\\d_\\d_(AM | PM)$", "", filenames)
still did not get what I needed.
You may use
filenames <- c('ABC_DG_MS-15-0452-268_206_281_12_1_2017_1_53_11_PM', 'ABC_RE_SP56-01_A_206_281_12_1_2017_1_52_34_AM')
gsub('^(?:[^_]+_){2}(.+?)_\\d+.*', '\\1', filenames)
Which yields
[1] "MS-15-0452-268" "SP56-01_A"
^ # start of the string
(?:[^_]+_){2} # not _, twice
(.+?) # anything lazily afterwards
_\\d+ # until there's _\d+
.* # consume the rest of the string
This pattern is replaced by the first captured group and hence the filename in question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With