Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting Dates Using Regular Expression in R using grepl

Tags:

regex

I apologize in advance, this might be a repeat question. However, I just spent the last hour over stackoverflow, and can't seem to find a solution. I am using grepl in R to try to extract some dates in a string and am noticing unexpected behavior.

Suppose we have a vector of strings

mystring = c("12-03-99", "A", "B")
date = grepl("[1-9]{2}", mystring)

> date [1] TRUE FALSE FALSE

This makes sense to me. However, if I try to add in the "-" to the regular expression, it yields unexpected behavior. For example

mystring = c("12-03-99", "A", "B")
date = grepl("[1-9]{2}-[1-9]{2}-[1-9]{2}", mystring)

> date [1] FALSE FALSE FALSE

Why does the second example yield False for the first element of the vector mystring ("12-03-99")?

Thank you in advance for your help!

Vincent

like image 923
Vincent Avatar asked Dec 14 '25 09:12

Vincent


1 Answers

Regex101

This is a simple mistake:

you are using [1-9] but you want [0-9].

Since your date 12-03-99 has a 0 in it you need it in your character array.

Try:

mystring = c("12-03-99", "A", "B")
date = grepl("[0-9]{2}-[0-9]{2}-[0-9]{2}", mystring)

or

date = grepl("\d{2}-\d{2}-\d{2}", mystring)

Regex:

[0-9]{2}-[0-9]{2}-[0-9]{2}

Regular expression visualization

Debuggex Demo


Note

This will also catch 00-00-00 as a valid date.

To fix this simply use the following regex:

[0-9]?[1-9]-[0-9]?[1-9]-[0-9]?[1-9]

This can be shortened to:

\d?[1-9]-\d?[1-9]-\d?[1-9]

and then changed to Regex101:

(\d?[1-9]-){2}\d?[1-9]
like image 80
abc123 Avatar answered Dec 16 '25 20:12

abc123



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!