UPDATE: This question was for Tensorflow 1.x. I upgraded to 2.0 and (at least on the simple code below) the reproducibility issue seems fixed on 2.0. So that solves my problem; but I'm still curious about what "best practices" were used for this issue on 1.x.
Training the exact same model/parameters/data on keras/tensorflow does not give reproducible results and the loss is significantly different each time you train the model. There are many stackoverflow questions about that (eg, How to get reproducible results in keras ) but the recommend workarounds don't seem to work for me or many other people on StackOverflow. OK, it is what it is.
But given that limitation of non-reproducibility with keras on tensorflow -- what's the best practice for comparing models and choosing hyper parameters? I'm testing different architectures and activations, but since the loss estimate is different each time, I'm never sure if one model is better than the other. Is there any best practice for dealing with this?
I don't think the issue has anything to do with my code, but just in case it helps; here's a sample program:
import os
#stackoverflow says turning off the GPU helps reproducibility, but it doesn't help for me
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ['PYTHONHASHSEED']=str(1)
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers 
import random
import pandas as pd
import numpy as np
#StackOverflow says this is needed for reproducibility but it doesn't help for me
from tensorflow.keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)
#make some random data
NUM_ROWS = 1000
NUM_FEATURES = 10
random_data = np.random.normal(size=(NUM_ROWS, NUM_FEATURES))
df = pd.DataFrame(data=random_data, columns=['x_' + str(ii) for ii in range(NUM_FEATURES)])
y = df.sum(axis=1) + np.random.normal(size=(NUM_ROWS))
def run(x, y):
    #StackOverflow says you have to set the seeds but it doesn't help for me
    tf.set_random_seed(1)
    np.random.seed(1)
    random.seed(1)
    os.environ['PYTHONHASHSEED']=str(1)
    model = keras.Sequential([
            keras.layers.Dense(40, input_dim=df.shape[1], activation='relu'),
            keras.layers.Dense(20, activation='relu'),
            keras.layers.Dense(10, activation='relu'),
            keras.layers.Dense(1, activation='linear')
        ])
    NUM_EPOCHS = 500
    model.compile(optimizer='adam', loss='mean_squared_error')
    model.fit(x, y, epochs=NUM_EPOCHS, verbose=0)
    predictions = model.predict(x).flatten()
    loss = model.evaluate(x,  y) #This prints out the loss by side-effect
#Each time we run it gives a wildly different loss. :-(
run(df, y)
run(df, y)
run(df, y)
Given the non-reproducibility, how can I evaluate whether changes in my hyper-parameters and architecture are helping or not?
It's sneaky, but your code does, in fact, lack a step for better reproducibility: resetting the Keras & TensorFlow graphs before each run. Without this, tf.set_random_seed() won't work properly - see correct approach below.
I'd exhaust all the options before tossing the towel on non-reproducibility; currently I'm aware of only one such instance, and it's likely a bug. Nonetheless, it's possible you'll get notably differing results even if you follow through all the steps - in that case, see "If nothing works", but each is clearly not very productive, thus it's best on focusing attaining reproducibility:
Definitive improvements:
reset_seeds(K) belowK.set_floatx('float64')
PYTHONHASHSEED before the Python kernel starts - e.g. from terminaltf.python.keras - see here
from keras.layers import ... and from tensorflow.keras.optimizers import ...)Also see related SO on reproducibility
If nothing works:
Correct reset method:
def reset_seeds(reset_graph_with_backend=None):
    if reset_graph_with_backend is not None:
        K = reset_graph_with_backend
        K.clear_session()
        tf.compat.v1.reset_default_graph()
        print("KERAS AND TENSORFLOW GRAPHS RESET")  # optional
    np.random.seed(1)
    random.seed(2)
    tf.compat.v1.set_random_seed(3)
    print("RANDOM SEEDS RESET")  # optional
Running TF on single CPU thread: (code for TF1-only)
session_conf = tf.ConfigProto(
      intra_op_parallelism_threads=1,
      inter_op_parallelism_threads=1)
sess = tf.Session(config=session_conf)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With