In the following string:
"I may opt for a yam for Amy, May, and Tommy."
How to remove non-alphabetic characters and convert all letter to lowercase and sort the letters within each word in R?
Meanwhile, I try to sort words in sentence and removes the duplicates.
You could use stringi
library(stringi)
unique(stri_sort(stri_trans_tolower(stri_extract_all_words(txt, simplify = TRUE))))
Which gives:
## [1] "a"     "amy"   "and"   "for"   "i"     "may"   "opt"   "tommy" "yam" 
Update
As per mentionned by @DavidArenburg, I overlooked the "sort the letters within words" part of your question. You didn't provide a desired output and no immediate application comes to mind but, assuming you want to identify which words have a matching counterpart (string distance of 0):
unique(stri_sort(stri_trans_tolower(stri_extract_all_words(txt, simplify = TRUE)))) %>%
  stringdistmatrix(., ., useNames = "strings", method = "qgram") %>%
#       a amy and for i may opt tommy yam
# a     0   2   2   4 2   2   4     6   2
# amy   2   0   4   6 4   0   6     4   0
# and   2   4   0   6 4   4   6     8   4
# for   4   6   6   0 4   6   4     6   6
# i     2   4   4   4 0   4   4     6   4
# may   2   0   4   6 4   0   6     4   0
# opt   4   6   6   4 4   6   0     4   6
# tommy 6   4   8   6 6   4   4     0   4
# yam   2   0   4   6 4   0   6     4   0
  apply(., 1, function(x) sum(x == 0, na.rm=TRUE)) 
# a   amy   and   for     i   may   opt tommy   yam 
# 1     3     1     1     1     3     1     1     3 
Words with more than one 0 per row ("amy", "may", "yam") have a scrambled counterpart.
str <- "I may opt for a yam for Amy, May, and Tommy."
## Clean the words (just keep letters and convert to lowercase)
words <- strsplit(tolower(gsub("[^A-Za-z ]", "", str)), " ")[[1]]
## split the words into characters and sort them
sortedWords <- sapply(words, function(word) sort(unlist(strsplit(word, ""))))
## Join the sorted letters back together
sapply(sortedWords, paste, collapse="")
# i     may     opt     for       a     yam     for     amy     may     and 
# "i"   "amy"   "opt"   "for"     "a"   "amy"   "for"   "amy"   "amy"   "adn" 
# tommy 
# "mmoty" 
## If you want to convert result back to string
do.call(paste, lapply(sortedWords, paste, collapse=""))
# [1] "i amy opt for a amy for amy amy adn mmoty"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With