Reviewing linear regressions via statsmodels OLS fit I see you have to use add_constant to add a constant '1' to all your points in the independent variable(s) before fitting. However my only understanding of intercepts in this context would be the value of y for our line when our x equals 0, so I'm not clear what purpose always just injecting a '1' here serves. What is this constant actually telling the OLS fit?
First, we always need to add the constant. The reason for this is that it takes care of the bias in the data (a constant difference which is there for all observations). Your idea involves adding a column of ones to the X, so that you can avoid 'add_constant()' right?
The OLS() function of the statsmodels. api module is used to perform OLS regression. It returns an OLS object. Then fit() method is called on this object for fitting the regression line to the data. The summary() method is used to obtain a table which gives an extensive description about the regression results.
adds a column of ones to the x1 array ( data['SAT'] ).
add_constant() command when you're fitting a line using statsmodels? statsmodels cannot fit a line through the data without this command.
sm.add_constant in statsmodel is the same as sklearn's fit_intercept parameter in LinearRegression(). If you don't do sm.add_constant or when LinearRegression(fit_intercept=False), then both statsmodels and sklearn algorithms assume that b=0 in y = mx + b, and it'll fit the model using b=0 instead of calculating what b is supposed to be based on your data.
It doesn't add a constant to your values, it adds a constant term to the linear equation it is fitting.  In the single-predictor case, it's the difference between fitting an a line y = mx to your data vs fitting y = mx + b.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With