Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return words with three consecutive double letters (e.g. bookkeeper) in R

Tags:

r

I'm trying to find all the words with three consecutive double letters, e.g., bookkeeper.

Currently its giving me any word with double letters, rather than three consecutive sets.

This is where I am at:

Collins <- Collins %>%
filter(nchar(test) >= 5)

dbl_letter <- function(word, position){
  substring(word, position, position) == substring(word, position+1, position+1)
  }


for(word in Collins$test){
    for(i in 1:nchar(word)){
      if(dbl_letter(word,i) == TRUE & dbl_letter(word,i+2) == TRUE  & dbl_letter(word,i+4) == TRUE){
        print(word)
      }
    }
  }
like image 549
clavat245 Avatar asked Jan 23 '26 21:01

clavat245


1 Answers

Using regular expression could possibly help you out:

word <- c("bookkeeper", "parrot", "oomm", "wordcloud", "oooooo", "aaaaa")
grepl("([A-Za-z])\\1([A-Za-z])\\2([A-Za-z])\\3", word)

grepl returns TRUE or FALSE if there are three consectuive double letters, including six times the same letter. The vector word as defined above gives us

[1]  TRUE FALSE FALSE FALSE  TRUE FALSE

Ignoring the case

The regular expression given above fails when the consectuive double letters are of different cases. So boOkKeEper fails the test. We can solve this by transforming the words to lower cases (or upper cases):

grepl("([A-Za-z])\\1([A-Za-z])\\2([A-Za-z])\\3", tolower(word))

In this case the regular expression simplifies to

grepl("([a-z])\\1([a-z])\\2([a-z])\\3", tolower(word))
like image 56
Martin Gal Avatar answered Jan 25 '26 13:01

Martin Gal