Is there a built-in way to quantify results of agrep function? E.g. in
agrep("test", c("tesr", "teqr", "toar"), max = 2, v=T)
[1] "tesr" "teqr"
tesr is only 1 char permutation away from test, while teqr is 2, and toar is 3 and hence not found. Apparently, tesr has higher "probability" than teqr. How can it be retrieved either in number of permutations or percentage?
Thanks!
Edit: Apologies for not putting this in question in first place. I am already running a two-step procedure: agrep to get my list, and then adist to get N permutations. adist is slower, running time is a big factor in my dataset
Another option using adist():
s <- c("tesr", "teqr", "toar")
s[adist("test", s) < 3]
Or using stringdist
library(stringdist)
s[stringdist("test", s, method = "lv") < 3]
Which gives:
#[1] "tesr" "teqr"
Benchmark
x <- rep(s, 10e5)
library(microbenchmark)
mbm <- microbenchmark(
levenshteinDist = x[which(levenshteinDist("test", x) < 3)],
adist = x[adist("test", x) < 3],
stringdist = x[stringdist("test", x, method = "lv") < 3],
times = 10
)
Which gives:

Unit: milliseconds
expr min lq mean median uq max neval cld
levenshteinDist 840.7897 1255.1183 1406.8887 1398.4502 1510.5398 1960.4730 10 b
adist 2760.7677 2905.5958 2993.9021 2986.1997 3038.7692 3472.7767 10 c
stringdist 145.8252 155.3228 210.4206 174.5924 294.8686 355.1552 10 a
The Levenshtein distance is the number of edits from one string to another. The package 'RecordLinkage' may be of interest. It provides the edit distance computation below, which should perform on par with agrep. Although it will not return the same results as agrep.
library(RecordLinkage)
ld <- levenshteinDist("test", c("tesr", "teqr", "toar"))
c("tesr", "teqr", "toar")[which(ld < 3)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With