Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find regression curve equation for a fitted PolynomialFeatures model

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

data=pd.DataFrame(
{"input": 
[0.001,0.015,0.066,0.151,0.266,0.402,0.45,0.499,0.598,0.646,0.738,0.782,0.86,0.894,0.924,0.95],
"output":[0.5263157894736842,0.5789473684210524,0.6315789473684206,0.6842105263157897, 
0.6315789473684206, 0.7894736842105263, 0.8421052631578945, 0.7894736842105263,  0.736842105263158,
0.6842105263157897,  0.736842105263158,  0.736842105263158,0.6842105263157897, 0.6842105263157897, 
0.6315789473684206,0.5789473684210524]})

I have the above data that includes input and output data and ı want to make a curve that properly fits this data. Firstly plotting of input and output values are here : enter image description here

I have made this code:

X=data.iloc[:,0].to_numpy()
X=X.reshape(-1,1)
y=data.iloc[:,1].to_numpy()
y=y.reshape(-1,1)

poly=PolynomialFeatures(degree=2)
poly.fit(X,y)
X_poly=poly.transform(X)

reg=LinearRegression().fit(X_poly,y)
plt.scatter(X,y,color="blue")
plt.plot(X,reg.predict(X_poly),color="orange",label="Polynomial Linear Regression")
plt.xlabel("Temperature")
plt.ylabel("Pressure")
plt.legend(loc="upper left")

plot is:

enter image description here

But ı don't find the above curve's equation (orange curve) how can ı find?

like image 342
trkmenha Avatar asked Sep 20 '25 02:09

trkmenha


1 Answers

Your plot actually corresponds to your code run with

poly=PolynomialFeatures(degree=7)

and not to degree=2. Indeed, running your code with the above change, we get:

enter image description here

Now, your polynomial features are:

poly.get_feature_names()
# ['1', 'x0', 'x0^2', 'x0^3', 'x0^4', 'x0^5', 'x0^6', 'x0^7']

and the respective coefficients of your linear regression are:

reg.coef_
# array([[   0.        ,    5.43894411,  -68.14277256,  364.28508827,
#         -941.70924401, 1254.89358662, -831.27091422,  216.43304954]])

plus the intercept:

reg.intercept_
# array([0.51228593])

Given the above, and setting

coef = reg.coef_[0]

since here we have a single feature in the initial data, your regression equation is:

y = reg.intercept_ + coef[0] + coef[1]*x + coef[2]*x**2 + coef[3]*x**3 + coef[4]*x**4 + coef[5]*x**5 + coef[6]*x**6 + coef[7]*x**7

For visual verification, we can plot the above function with some x data in [0, 1]

x = np.linspace(0, 1, 15) 

Running the above expression for y and

plt.plot(x, y)

gives:

enter image description here

Using some randomly generated data x, we can verify that the results of the equation y_eq are indeed equal to the results produced by the regression model y_reg within the limits of numerical precision:

x = np.random.rand(1,10)
y_eq = reg.intercept_ + coef[0] + coef[1]*x + coef[2]*x**2 + coef[3]*x**3 + coef[4]*x**4 + coef[5]*x**5 + coef[6]*x**6 + coef[7]*x**7
y_reg = np.concatenate(reg.predict(poly.transform(x.reshape(-1,1)))) 

y_eq
# array([[0.72452703, 0.64106819, 0.67394222, 0.71756648, 0.71102853,
#         0.63582055, 0.54243177, 0.71104983, 0.71287962, 0.6311952 ]])

y_reg
# array([0.72452703, 0.64106819, 0.67394222, 0.71756648, 0.71102853,
#        0.63582055, 0.54243177, 0.71104983, 0.71287962, 0.6311952 ])

np.allclose(y_reg, y_eq)
# True

Irrelevant to the question, I guess you already know that trying to fit such high order polynomials to so few data points is not a good idea, and you probably should remain to a low degree of 2 or 3...

like image 130
desertnaut Avatar answered Sep 21 '25 16:09

desertnaut