Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between condensed and redundant distance matrices?

New to python and programming in general:

The documentation to squareform states the following:

Converts a vector-form distance vector to a square-form distance matrix, and vice-versa.

Converts a 1D array into a squared matrix?

Where the paramenter X:

Either a condensed or redundant distance matrix.

and returns:

If a condensed distance matrix is passed, a redundant one is returned, or if a redundant one is passed, a condensed distance matrix is returned.

  1. what is the difference between condensed and redundant matrices?
  2. what is the relationship between condensed/redundant matrix and vector/square form in which it takes?

The return of pdist papers to return condensed distance matrix:

Returns a condensed distance matrix Y. For each i and j (where i is less than j is less than n), the metric dist(u=X[i], v=X[j]) is computed and stored in entry ij.

Am I right in thinking that in each element Y stores the distance between a particular point and an other point? An example with 3 observations would mean a condensed matrix with 9 elements?


1 Answers

If you have a nxn matrix then each pairwise combination from the set N exists twice, once in each order, ab and ba. So if you create a distance matrix from a set of N points you can condense the data by only storing each point once, and neglecting any comparisons between points and themselves.

for example if we have the points a, b, and c we would have the distance matrix

    a    b    c
a   0    ab   ac
b   ba   0    bc
c   ca   cb   0

and the condensed distance matrix,

    a    b    c
         ab   ac
              bc

Because distance masers are unsigned the condensed table retains all the information.

like image 80
kpie Avatar answered Nov 17 '25 18:11

kpie