Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Standardization of an numpy array

Tags:

python

numpy

I am trying to standardize a numpy array of shape(M, N) so that its column mean is 0. I think I have used the formula of standardization correctly where x is the random variable and z is the standardized version of x.

z = (x - mean(x)) / std(x)

But the column mean of the resulted array is not 0. They are very small number but not zero. Any insight regarding my misunderstanding or mistake is welcome. Here is my code:

import numpy as np

X = np.load('data/filename.npy').astype('float')
XNormed = (X - np.mean(X, axis=0))/np.std(X, axis=0)
column_mean = np.mean(XNormed, axis=0)
print(column_mean)
like image 466
Kajaree Das Avatar asked Dec 10 '25 13:12

Kajaree Das


1 Answers

Your code is correct but as you mentioned in the formula of your own question you need to divide by the standard deviation and not by the range of the data (as in your code). The line below , which uses numpy's std() should correct it:

XNormed = (X - X.mean())/(X.std())
like image 81
Jose Avila Avatar answered Dec 12 '25 08:12

Jose Avila



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!