Getting different result each time I run a linear regression using scikit

Question

Hi I have a linear regression model that i am trying to optimise. I am optimising the span of an exponential moving average and the number of lagged variables that I use in the regression.

However I keep finding that the results and the calculated mse keep coming up with different final results. No idea why can anyone help?

Process after starting loop: 1. Create new dataframe with three variables 2. Remove nil values 3. Create ewma's for each variable 4. Create lags for each variable 5. Drop NA's 6. Create X,y 7. Regress and save ema span and lag number if better MSE 8. start loop with next values

I know that this could be a question for cross validated but since it could be a programmatic I have posted here:

bestema = 0
bestlag = 0
mse = 1000000

for e in range(2, 30):
    for lags in range(1, 20):
        df2 = df[['diffbn','diffbl','diffbz']]
        df2 = df2[(df2 != 0).all(1)]        
        df2['emabn'] = pd.ewma(df2.diffbn, span=e)
        df2['emabl'] = pd.ewma(df2.diffbl, span=e)
        df2['emabz'] = pd.ewma(df2.diffbz, span=e)
        for i in range(0,lags):
            df2["lagbn%s" % str(i+1)] = df2["emabn"].shift(i+1)
            df2["lagbz%s" % str(i+1)] = df2["emabz"].shift(i+1)
            df2["lagbl%s" % str(i+1)] = df2["emabl"].shift(i+1)
        df2 = df2.dropna()
        b = list(df2)
            #print(a)
        b.remove('diffbl')
        b.remove('emabn')
        b.remove('emabz')
        b.remove('emabl')
        b.remove('diffbn')
        b.remove('diffbz')
        X = df2[b]
        y = df2["diffbl"]
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
        #print(X_train.shape)
        regr = linear_model.LinearRegression()
        regr.fit(X_train, y_train)
        if(mean_squared_error(y_test,regr.predict(X_test)) < mse):
            mse = mean_squared_error(y_test,regr.predict(X_test) ** 2)
            #mse = mean_squared_error(y_test,regr.predict(X_test))
            bestema = e
            bestlag = lags
            print(regr.coef_)
            print(bestema)
            print(bestlag)
            print(mse)

joris · Accepted Answer

The train_test_split function from sklearn (see docs: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html) is random, so it is logical you get different results each time.
You can pass an argument to the random_state keyword to have it the same each time.

Getting different result each time I run a linear regression using scikit

Tags:

python

pandas

scikit-learn

linear-regression

azuric

1 Answers

joris

Recent Activity

Donate For Us

Getting different result each time I run a linear regression using scikit

Tags:

python

pandas

scikit-learn

linear-regression

azuric

1 Answers

joris

Related questions

Recent Activity

Donate For Us