Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

np.poly1d: how to calculate R^2

I am fitting my data to a linear regression. But I want to know how to calculate the R2 values. The following is the code I have so far.

total_csv= pd.read_csv('IgG1_sigma_biospin_neg.csv',header=0).iloc[:,:]
x_values=(19,20,21,22)
y_values=IgG1_sigma_biospin_neg.loc[0, ['19-', '20-', '21-', '22-']].tolist()


my_fitting= np.polyfit(x_values,y_values,1)
my_lin_fitting = np.poly1d(my_fitting) 
my_x=Symbol('x')
print('my_equation:',expand(my_lin_fitting (my_x)))

I get the equation of the linear fitting of my data 35.6499591999999*x + 6018.6395529.

In [95]:y_values
Out[95]: [6698.0902240000005, 6733.253559000001, 6757.754712999999, 6808.75637]

Do you know how to calculate R2 values?

like image 317
user7852656 Avatar asked Sep 05 '25 03:09

user7852656


1 Answers

To the best of my knowledege, np.polyfit does not provide a coefficient of determination (R2).

The residual that Richard mentioned in his answer is something different, named Sum of Squares Error (SSE). More info about it here: https://365datascience.com/tutorials/statistics-tutorials/sum-squares/

Good news is, you can easily calculate R2 from SSE. First you calculate the Sum of Square Total (SST), then the R2 is merely R2 = 1 - SSE / SST. (See above link for further explanations.)

import numpy as np 

# generate pseudo-data so the code can be run standalone (nicer for a mwe)
x_values = np.arange(100)
y_values = 3 * x_values + 2 + np.random.random(100)-0.5


my_fitting = np.polyfit(x_values, y_values, 1, full=True)
coeff = my_fitting[0]

### Residual or Sum of Square Error (SSE)
SSE = my_fitting[1][0]

### Determining the Sum of Square Total (SST)
## the squared differences between the observed dependent variable and its mean
diff = y_values - y_values.mean()
square_diff = diff ** 2
SST = square_diff.sum()

###  Now getting the coefficient of determination (R2)
R2 = 1 - SSE/SST 
print(R2)

Another approach is to use the already implemented function provided by Scikit-learn./

##  Alternative using sklearn
from sklearn.metrics import r2_score

predict = np.poly1d(coeff)
R2 = r2_score(y_values, predict(x_values))
print(R2)

Both methods give me the very same answer.

like image 187
snake_charmer Avatar answered Sep 07 '25 20:09

snake_charmer