Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a need to normalise input vector for prediction in SVM?

For input data of different scale I understand that the values used to train the classifier has to be normalized for correct classification(SVM).

So does the input vector for prediction also needs to be normalized?

The scenario that I have is that the training data is normalized and serialized and saved in the database, when a prediction has to be done the serialized data is deserialized to get the normalized numpy array, and the numpy array is then fit on the classifier and the input vector for prediction is applied for prediction. So does this input vector also needs to be normalized? If so how to do it, since at the time of prediction I don't have the actual input training data to normalize?

Also I am normalizing along axis=0 , i.e. along the column.

my code for normalizing is :

preprocessing.normalize(data, norm='l2',axis=0)

is there a way to serialize preprocessing.normalize

like image 717
Jibin Mathew Avatar asked Oct 25 '25 05:10

Jibin Mathew


1 Answers

In SVMs it is recommended a scaler for several reasons.

  • It is better to have the same scale in many optimization methods.
  • Many kernel functions use internally an euclidean distance to compare two different samples (in the gaussian kernel the euclidean distance is in the exponential term), if every feature has a different scale, the euclidean distance only take into account the features with highest scale.

When you put the features in the same scale you must remove the mean and divide by the standard deviation.

        xi - mi
xi -> ------------
         sigmai

You must storage the mean and standard deviation of every feature in the training set to use the same operations in future data.

In python you have functions to do that for you:

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

To obtain means and standar deviations:

scaler = preprocessing.StandardScaler().fit(X)

To normalize then the training set (X is a matrix where every row is a data and every column a feature):

X = scaler.transform(X)

After the training, you must normalize of future data before the classification:

newData = scaler.transform(newData)
like image 91
Rob Avatar answered Oct 27 '25 00:10

Rob



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!