Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mahalanobis distance between two vectors

I tried to apply mahal to calculate the Mahalanobis distance between 2 row-vectors of 27 variables, i.e mahal(X, Y), where X and Y are the two vectors. However, it comes up with an error:

The number of rows of X must exceed the number of columns.

After a few minutes of research I got that I can't use it like this, but I'm still not sure sure why. Can some explain it to me?

Also I have below an example of mahal method :

>> mahal([1.55 5 32],[5.76 43 34; 6.7 32 5; 3 3 5; 34 12 6;])

ans =    
   11.1706

Can someone clarify how MATLAB calculate the answer in this case?

Edit:
I found this code that calculate the mahalanobis distance:

S = cov(X);
mu = mean(X);
d = (Y-mu)*inv(S)*(Y-mu)'
d = ((Y-mu)/S)*(Y-mu)'; % <-- Mathworks prefers this way

I tested it on [1.55 5 32], and [5.76 43 34; 6.7 32 5; 3 3 5; 34 12 6;] and it gave me the same result as if I used the mahal function (11.1706), and I tried to calculate the distance between the 2 vectors of 27 variables and it works. What do you think about it? Can I count on this solution since the mahal function can't do what I need?

like image 619
Maystro Avatar asked Oct 27 '25 05:10

Maystro


1 Answers

mahal(X,Y)... gave me this error:
"The number of rows of X must exceed the number of columns."

The documentation states that Y must have more rows than columns (also note that the documentation denotes X as the second input parameter, not the first). For you this means that the second array that you're feeding into mahal has more rows than columns.

Why is that so important? The purpose of this restriction is make sure that mahal has enough data to build the correlation matrix used in the computation of the Mahalanobis distance. If there's not enough information, the output would be garbage.

In your case your input arrays are two input vectors, each having 27 elements. Are the 27 elements correspond to different observations, or are they one observation of 27 variables? If it's the former, just make sure both vectors are column vectors:

mahal(X(:), Y(:))

and you're good to go. If each vector contains only one observation, your estimation of the covariance matrix will be entirely inaccurate. Again, the rows of the inputs should be the observations!

Can someone clarify how MATLAB calculated the answer in this case?

The Mahalanobis distance between two vectors x and y is: dM(x, y) = sqrt((x-y)TS-1(x-y)), where S is their covariance matrix.

In MATLAB1 mahal(Y,X) is efficiently implemented in the following manner:

m = mean(X,1);
M = m(ones(ry,1),:);
C = X - m(ones(rx,1),:);
[Q,R] = qr(C,0);

ri = R'\(Y-M)';
d = sum(ri.*ri,1)'*(rx-1);

You can verify that with:

type mahal

Note that MATLAB calculates the Mahalanobis distance in squared units, so in your example the Mahalanobis distance is actually the square root of 11.1706, i.e 3.3422.

Can I count on this [my] solution since the mahal function can't do what I need?

You're doing everything correctly, so it's safe to use. Having said that, note that MATLAB did restrict the dimensions of the second input array for a good reason (stated above).

If X contains only one row, cov automatically converts it to a column vector, which means that each value will be treated as a different observation. The resulting S would be inaccurate (if not garbage).


1 Checked for MATLAB release version R2007b.

like image 173
Eitan T Avatar answered Oct 29 '25 23:10

Eitan T



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!