I'm trying to find all the words with three consecutive double letters, e.g., bookkeeper.
Currently its giving me any word with double letters, rather than three consecutive sets.
This is where I am at:
Collins <- Collins %>%
filter(nchar(test) >= 5)
dbl_letter <- function(word, position){
substring(word, position, position) == substring(word, position+1, position+1)
}
for(word in Collins$test){
for(i in 1:nchar(word)){
if(dbl_letter(word,i) == TRUE & dbl_letter(word,i+2) == TRUE & dbl_letter(word,i+4) == TRUE){
print(word)
}
}
}
Using regular expression could possibly help you out:
word <- c("bookkeeper", "parrot", "oomm", "wordcloud", "oooooo", "aaaaa")
grepl("([A-Za-z])\\1([A-Za-z])\\2([A-Za-z])\\3", word)
grepl returns TRUE or FALSE if there are three consectuive double letters, including six times the same letter. The vector word as defined above gives us
[1] TRUE FALSE FALSE FALSE TRUE FALSE
The regular expression given above fails when the consectuive double letters are of different cases. So boOkKeEper fails the test. We can solve this by transforming the words to lower cases (or upper cases):
grepl("([A-Za-z])\\1([A-Za-z])\\2([A-Za-z])\\3", tolower(word))
In this case the regular expression simplifies to
grepl("([a-z])\\1([a-z])\\2([a-z])\\3", tolower(word))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With