Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SciKit Learn R-squared is very different from square of Pearson's Correlation R

I have 2 numpy arrays ike so:

a = np.array([32.0, 25.97, 26.78, 35.85, 30.17, 29.87, 30.45, 31.93, 30.65, 35.49, 
              28.3, 35.24, 35.98, 38.84, 27.97, 26.98, 25.98, 34.53, 40.39, 36.3])

b = np.array([28.778585, 31.164268, 24.690865, 33.523693, 29.272448, 28.39742,
              28.950092, 29.701189, 29.179174, 30.94298 , 26.05434 , 31.793175,
              30.382706, 32.135723, 28.018875, 25.659306, 27.232124, 28.295502,
              33.081223, 30.312504])

When I calculate the R-squared using SciKit Learn I get a completely different value than when I calculate Pearson's Correlation and then square the result:

sk_r2 = sklearn.metrics.r2_score(a, b)
print('SciKit R2: {:0.5f}\n'.format(sk_r2))

pearson_r = scipy.stats.pearsonr(a, b)
print('Pearson R: ', pearson_r)
print('Pearson R squared: ', pearson_r[0]**2)

Results in:
SciKit R2: 0.15913

Pearson R: (0.7617075766854164, 9.534162339384296e-05)
Pearson R squared: 0.5801984323799696

I realize that an R-squared value can sometimes be negative for a poorly fitting model (https://stats.stackexchange.com/questions/12900/when-is-r-squared-negative) and therefore the square of Pearson's correlation is not always equal to R-squared. However, I thought that for a positive R-squared value it was always equal to Pearson's correlation squared? How are these R-squared values so different?

like image 551
daragh Avatar asked Oct 16 '25 01:10

daragh


2 Answers

Pearson correlation coefficient R and R-squared coefficient of determination are two completely different statistics.

You can take a look at https://en.wikipedia.org/wiki/Pearson_correlation_coefficient and https://en.wikipedia.org/wiki/Coefficient_of_determination


update

Persons's r coefficient is a measure of linear correlation between two variables and is

enter image description here

where bar x and bar y are the means of the samples.

R2 coefficient of determination is a measure of goodness of fit and is

enter image description here

where hat y is the predicted value of y and bar y is the mean of the sample.

Thus

  1. they measure different things
  2. r**2 is not equal to R2 because their formula are totally different

update 2

r**2 is equal to R2 only in the case that you calculate r with a variable (say y) and the predicted variable hat y from a linear model

Let's make an example using the two arrays you provided

import numpy as np
import pandas as pd
import scipy.stats as sps
import statsmodels.api as sm
from sklearn.metrics import r2_score as R2
import matplotlib.pyplot as plt

a = np.array([32.0, 25.97, 26.78, 35.85, 30.17, 29.87, 30.45, 31.93, 30.65, 35.49, 
              28.3, 35.24, 35.98, 38.84, 27.97, 26.98, 25.98, 34.53, 40.39, 36.3])

b = np.array([28.778585, 31.164268, 24.690865, 33.523693, 29.272448, 28.39742,
              28.950092, 29.701189, 29.179174, 30.94298 , 26.05434 , 31.793175,
              30.382706, 32.135723, 28.018875, 25.659306, 27.232124, 28.295502,
              33.081223, 30.312504])

df = pd.DataFrame({
    'x': a,
    'y': b,
})

df.plot(x='x', y='y', marker='.', ls='none', legend=False);

enter image description here

now we fit a linear regression model

mod = sm.OLS.from_formula('y ~ x', data=df)
mod_fit = mod.fit()
print(mod_fit.summary())

output

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.580
Model:                            OLS   Adj. R-squared:                  0.557
Method:                 Least Squares   F-statistic:                     24.88
Date:                Mon, 29 Mar 2021   Prob (F-statistic):           9.53e-05
Time:                        14:12:15   Log-Likelihood:                -36.562
No. Observations:                  20   AIC:                             77.12
Df Residuals:                      18   BIC:                             79.12
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     16.0814      2.689      5.979      0.000      10.431      21.732
x              0.4157      0.083      4.988      0.000       0.241       0.591
==============================================================================
Omnibus:                        6.882   Durbin-Watson:                   3.001
Prob(Omnibus):                  0.032   Jarque-Bera (JB):                4.363
Skew:                           0.872   Prob(JB):                        0.113
Kurtosis:                       4.481   Cond. No.                         245.
==============================================================================

and compute both r**2 and R2 and we can see that in this case they're equal

predicted_y = mod_fit.predict(df.x)
print("R2 :", R2(df.y, predicted_y))
print("r^2:", sps.pearsonr(df.y, predicted_y)[0]**2)

output

R2 : 0.5801984323799696
r^2: 0.5801984323799696

You did R2(df.x, df.y) that can't be equal to our computed values because you used a measure of goodness-of-fit between independent x and dependent y variables. We instead used both r and R2 with y and predicted value of y.

like image 77
Max Pierini Avatar answered Oct 17 '25 16:10

Max Pierini


I was in the same situation too. For me it happened when I compared R-squared in scikit-learn with R-squared as it is calculated by R caret package.

The R-squared in R caret package, or in your case in scipy.stats.pearsonr is the square of "Pearson R" by the definition. A measure of correlation. See its definition here(by definition could be between zero and 1).

However, the R-squared in scikit-learn is a measure of accuracy, you can look at its definition in its user guide.(by definition could be between -Inf and 1).

Bottom line, don't compare them. They are different measures.

like image 32
Alireza Amani Avatar answered Oct 17 '25 14:10

Alireza Amani



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!