Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting different result each time I run a linear regression using scikit

Hi I have a linear regression model that i am trying to optimise. I am optimising the span of an exponential moving average and the number of lagged variables that I use in the regression.

However I keep finding that the results and the calculated mse keep coming up with different final results. No idea why can anyone help?

Process after starting loop: 1. Create new dataframe with three variables 2. Remove nil values 3. Create ewma's for each variable 4. Create lags for each variable 5. Drop NA's 6. Create X,y 7. Regress and save ema span and lag number if better MSE 8. start loop with next values

I know that this could be a question for cross validated but since it could be a programmatic I have posted here:

bestema = 0
bestlag = 0
mse = 1000000

for e in range(2, 30):
    for lags in range(1, 20):
        df2 = df[['diffbn','diffbl','diffbz']]
        df2 = df2[(df2 != 0).all(1)]        
        df2['emabn'] = pd.ewma(df2.diffbn, span=e)
        df2['emabl'] = pd.ewma(df2.diffbl, span=e)
        df2['emabz'] = pd.ewma(df2.diffbz, span=e)
        for i in range(0,lags):
            df2["lagbn%s" % str(i+1)] = df2["emabn"].shift(i+1)
            df2["lagbz%s" % str(i+1)] = df2["emabz"].shift(i+1)
            df2["lagbl%s" % str(i+1)] = df2["emabl"].shift(i+1)
        df2 = df2.dropna()
        b = list(df2)
            #print(a)
        b.remove('diffbl')
        b.remove('emabn')
        b.remove('emabz')
        b.remove('emabl')
        b.remove('diffbn')
        b.remove('diffbz')
        X = df2[b]
        y = df2["diffbl"]
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
        #print(X_train.shape)
        regr = linear_model.LinearRegression()
        regr.fit(X_train, y_train)
        if(mean_squared_error(y_test,regr.predict(X_test)) < mse):
            mse = mean_squared_error(y_test,regr.predict(X_test) ** 2)
            #mse = mean_squared_error(y_test,regr.predict(X_test))
            bestema = e
            bestlag = lags
            print(regr.coef_)
            print(bestema)
            print(bestlag)
            print(mse)
like image 859
azuric Avatar asked Dec 04 '25 14:12

azuric


1 Answers

The train_test_split function from sklearn (see docs: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html) is random, so it is logical you get different results each time.
You can pass an argument to the random_state keyword to have it the same each time.

like image 105
joris Avatar answered Dec 07 '25 15:12

joris



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!