Python: 'for' loops and iteration in Linear Regression

Question

I'm building a basic Linear regression model using the statsmodel package and here's what I'm trying to do:
Build a 'for' loop that checks the probabilities of each of the features, checks if they're greater than 0.05, if yes: drop the feature from training (& test) data, fit model again, and repeat till all probabilities are < 0.05.
Here's what I've done so far:

for x,y in zip(lrmodel.pvalues,xtrain.columns): 
   if x>0.05:
      xtrain = xtrain.drop(y,axis=1)
      xtest = xtest.drop(y,axis=1)
      lrmodel = sm.OLS(ytrain,xtrain).fit()
      finalmodel = lrmodel
    else:
      finalmodel = lrmodel

The problem with this loop is that it doesn't iterate over the pvalues, rather it removes all the probabilities>0.05 within a single shot.
If anyone could help me, I would be grateful. Thanks!

Kevin Fang · Accepted Answer

I think you need a while loop on top of this:

while max(lrmodel.pvalues)>0.05:
    for x,y in zip(lrmodel.pvalues,xtrain.columns): 
        if x>0.05:
            xtrain = xtrain.drop(y,axis=1)
            xtest = xtest.drop(y,axis=1)
            lrmodel = sm.OLS(ytrain,xtrain).fit()
            break
# after all the values are less than 0.05, assign the model to final model
finalmodel = lrmodel

One potential problem of this is: you have to make sure all the values will be less than 0.05 eventually, otherwise you need an extra logic to terminate the loop. For example,

while len(lrmodel.pvalues)>0 and max(lrmodel.pvalues)>0.05:

Python: 'for' loops and iteration in Linear Regression

Tags:

python

for-loop

machine-learning

linear-regression

statsmodels

PoojaV

1 Answers

Kevin Fang

Recent Activity

Donate For Us

Python: 'for' loops and iteration in Linear Regression

Tags:

python

for-loop

machine-learning

linear-regression

statsmodels

PoojaV

1 Answers

Kevin Fang

Related questions

Recent Activity

Donate For Us