I apologize in advance, this might be a repeat question. However, I just spent the last hour over stackoverflow, and can't seem to find a solution. I am using grepl in R to try to extract some dates in a string and am noticing unexpected behavior.
Suppose we have a vector of strings
mystring = c("12-03-99", "A", "B")
date = grepl("[1-9]{2}", mystring)
> date
[1] TRUE FALSE FALSE
This makes sense to me. However, if I try to add in the "-" to the regular expression, it yields unexpected behavior. For example
mystring = c("12-03-99", "A", "B")
date = grepl("[1-9]{2}-[1-9]{2}-[1-9]{2}", mystring)
> date
[1] FALSE FALSE FALSE
Why does the second example yield False for the first element of the vector mystring ("12-03-99")?
Thank you in advance for your help!
Vincent
This is a simple mistake:
you are using [1-9] but you want [0-9].
Since your date 12-03-99 has a 0 in it you need it in your character array.
Try:
mystring = c("12-03-99", "A", "B")
date = grepl("[0-9]{2}-[0-9]{2}-[0-9]{2}", mystring)
or
date = grepl("\d{2}-\d{2}-\d{2}", mystring)
Regex:
[0-9]{2}-[0-9]{2}-[0-9]{2}

Debuggex Demo
This will also catch 00-00-00 as a valid date.
To fix this simply use the following regex:
[0-9]?[1-9]-[0-9]?[1-9]-[0-9]?[1-9]
This can be shortened to:
\d?[1-9]-\d?[1-9]-\d?[1-9]
and then changed to Regex101:
(\d?[1-9]-){2}\d?[1-9]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With