I am having trouble extract best match from string-distance matrix.
I am using the package stringdist to compute string-distance matrix.
For example, i am generating my matrix using these lines of code.
library(stringdist)
lookup <- c('Dog', 'Cat', 'Bear')
data <- c('Do g', 'Do gg', 'Caat')
d.matrix <- stringdistmatrix(a = lookup, b = data, useNames="strings",method="cosine")
The matrix looks something like this

My approach is to extract the cosine similarity with lowest number being the best match.
For example, "Do g" would match with "Dog"
What i want to generate is a matching pair data-frame with consine similarity value
data | matchwith | cosine.s
Do g Dog 0.1338746
Do gg Dog 0.1271284
Caat Cat 0.05719096
I have no clue how to get the data to the table format that i want (above).
Any help would be much appreciated.
The which.min function is a good solution for this problem.
This a solution using base R:
library(stringdist)
lookup <- c('Dog', 'Cat', 'Bear')
data <- c('Do g', 'Do gg', 'Caat')
d.matrix <- stringdistmatrix(a = lookup, b = data, useNames="strings",method="cosine")
#list of minimun cosine.s
cosines<-apply(d.matrix, 2, min)
#return list of the row number of the minimum value
minlist<-apply(d.matrix, 2, which.min)
#return list of matching values
matchwith<-lookup[minlist]
#final answer
answer<-data.frame(data, matchwith, cosines)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With