Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error on prediction running keras multi_gpu_model

I've an issue running a Keras model on a Google Cloud Platform instance.
The model is the following:

n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]

train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))

verbose, epochs, batch_size = 1, 1, 64  # low number of epochs just for testing purpose
with tf.device('/cpu:0'):
    m = Sequential()
    m.add(CuDNNLSTM(20, input_shape=(n_timesteps, n_features)))
    m.add(LeakyReLU(alpha=0.1))
    m.add(RepeatVector(n_outputs))
    m.add(CuDNNLSTM(20, return_sequences=True))
    m.add(LeakyReLU(alpha=0.1))
    m.add(TimeDistributed(Dense(20)))
    m.add(LeakyReLU(alpha=0.1))
    m.add(TimeDistributed(Dense(1)))

self.model = multi_gpu_model(m, gpus=8)
self.model.compile(loss='mse', optimizer='adam')

self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)

As you can see from the code above, I run the model on machine with 8 GPUs (Nvidia Tesla K80).
Train works well, without any errors. However, the prediction fails and returns the following error:

W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at cudnn_rnn_ops.cc:1336 : Unknown: CUDNN_STATUS_BAD_PARAM in tensorflow/stream_executor/cuda/cuda_dnn.cc(1285): 'cudnnSetTensorNdDescriptor( tensor_desc.get(), data_type, sizeof(dims) / sizeof(dims[0]), dims, strides)'

Here the code to run the prediction:

self.model.predict(input_x)

What I've noticed is that if I remove the code for multi-GPU data parallelism, the code works well using a single GPU.
To be more precise, if I comment this line, the code works without error

self.model = multi_gpu_model(m, gpus=8)

What am I missing?

virtualenv information

cudatoolkit - 10.0.130
cudnn - 7.6.4
keras - 2.2.4
keras-applications - 1.0.8
keras-base - 2.2.4
keras-gpu - 2.2.4
python - 3.6

UPDATE

train_x.shape = (1441, 288, 1)
train_y.shape = (1441, 288, 1)
input_x.shape = (1, 288, 1)

After Olivier Dehaene's reply I tried his suggestion and it worked.
I tried to modify the input_x shape in order to obtain (8, 288, 1).
In order to do that I also modified train_x and train_y shapes.
Here a recap:

train_x.shape = (8065, 288, 1)
train_y.shape = (8065, 288, 1)
input_x.shape = (8, 288, 1)

But now I've the same error on the training phase, on this line:

self.model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
like image 650
Giordano Avatar asked Oct 14 '25 15:10

Giordano


1 Answers

From the tf.keras.utils.multi_gpu_model we can see that it works in the following way:

  • Divide the model's input(s) into multiple sub-batches.
  • Apply a model copy on each sub-batch. Every model copy is executed on a dedicated GPU.
  • Concatenate the results (on CPU) into one big batch.

You are triggering an error because the input of the CuDNNLSTM layer is empty for at least one of the model copy. This is because the divide operations requires that: input // n_gpus > 0

Try this code out:

input_x = np.random.randn(8, n_timesteps, n_features)
model.predict(input_x)
like image 141
Olivier Dehaene Avatar answered Oct 17 '25 05:10

Olivier Dehaene