I have a distance matrix:
> mat
          hydrogen   helium  lithium beryllium    boron
hydrogen  0.000000 2.065564 3.940308  2.647510 2.671674
helium    2.065564 0.000000 2.365661  1.697749 1.319400
lithium   3.940308 2.365661 0.000000  3.188148 2.411567
beryllium 2.647510 1.697749 3.188148  0.000000 2.499369
boron     2.671674 1.319400 2.411567  2.499369 0.000000
And a data frame:
> results
El1      El2    Score
Helium Hydrogen   92
Boron   Helium    61
Boron  Lithium    88
I want to calculate all the pairwise distances between the words in results$El1 and results$El2 to get the following:
> results
El1      El2    Score   Dist
Helium Hydrogen   92    2.065564
Boron   Helium    61    1.319400
Boron  Lithium    88    2.411567
I did this with a for loop but it seems really clunky. Is there a more elegant way to search and extract distances with fewer lines of code?
Here is my current code:
names = row.names(mat) 
num.results <- dim(results)[1]   
El1 =  match(results$El1, names)  
El2 = match(results$El2, names)    
el.dist <- matrix(0, num.results, 1)        
for (i1 in c(1:num.results)) {             
el.dist[i1, 1] <- mat[El1[i1], El2[i1]]
}
results$Dist = el.dist[,1] 
cols <- match(tolower(results$El1), colnames(mat))
rows <- match(tolower(results$El2), colnames(mat))
results$Dist <- mat[cbind(rows, cols)]
results
     El1      El2 Score     Dist
1 Helium Hydrogen    92 2.065564
2  Boron   Helium    61 1.319400
3  Boron  Lithium    88 2.411567
You'll recognize most of the code. The one to focus on is mat[cbind(rows, cols)]. With matrices, we are allowed to subset by another matrix with the same number of columns as dimensions. From the ?`[` help:
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With