Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid float values in regression models

I am trying to predict wine quality (ranges from 1 to 10) using regression models such as linear,SGDRegressor, ridge,lasso.

dataset:http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv

Independent values:volatile acidity,residual sugar,free sulfur dioxide,total sulfur dioxide,alchohol Dependent:Quality

Linear model

regr = linear_model.LinearRegression(n_jobs=3)
regr.fit(x_train, y_train)
predicted = regr.predict(x_test)

predicted values for LinearRegression array([ 5.33560542, 5.47347404, 6.09337194, ..., 5.67566813, 5.43609198, 6.08189 ])

predicted values are in float instead of (1,2,3...10) I tried to round predicted values using numpy

predicted = np.round(regr.predict(x_test))` but my accuracy gone down with this attempt.

SGDRegressor model.

from sklearn import linear_model
np.random.seed(0)
clf = linear_model.SGDRegressor()
clf.fit(x_train, y_train)
redicted = np.floor(clf.predict(x_test))

predicted output values for SGDRegressor:

array([ -2.77685458e+12,   3.26826414e+12,   4.18655713e+11, ...,
     4.72375220e+12,  -7.08866307e+11,   3.95571514e+12])

Here I am unable to convert the output values into integers.

Could someone please let me know the best way to predict the wine quality using these regression models.

like image 531
Praneeth Avatar asked Sep 03 '25 07:09

Praneeth


1 Answers

You are doing a regression and therefore the output is continuous in nature.

The thing you should note is that your mini-project on predicting wine quality is not a classification problem. The response variable y, the wine quality, has intrinsic order which means a score of 6 is strictly better than a score of 5. It is NOT categorical variable where different numbers just represent different groups where groups are non-comparable.

like image 200
Jianxun Li Avatar answered Sep 04 '25 22:09

Jianxun Li