I've created virtual notebook on Paperspace cloud infrastructure with Tensorflow GPU P5000 virtual instance on the backend. When i am starting to train my network, it woks 2x SLOWER than on my MacBook Pro with pure CPU runtime engine. How could i ensure that Keras NN is using GPU instead of CPU during training process?
Please find my code below:
from tensorflow.contrib.keras.api.keras.models import Sequential
from tensorflow.contrib.keras.api.keras.layers import Dense
from tensorflow.contrib.keras.api.keras.layers import Dropout
from tensorflow.contrib.keras.api.keras import utils as np_utils
import numpy as np
import pandas as pd
# Read data
pddata= pd.read_csv('data/data.csv', delimiter=';')
# Helper function (prepare & test data)
def split_to_train_test (data):
    trainLenght = len(data) - len(data)//10
    trainData = data.loc[:trainLenght].sample(frac=1).reset_index(drop=True)
    testData = data.loc[trainLenght+1:].sample(frac=1).reset_index(drop=True)
    trainLabels = trainData.loc[:,"Label"].as_matrix()
    testLabels = testData.loc[:,"Label"].as_matrix()
    trainData = trainData.loc[:,"Feature 0":].as_matrix()
    testData  = testData.loc[:,"Feature 0":].as_matrix()
    return (trainData, testData, trainLabels, testLabels)
# prepare train & test data
(X_train, X_test, y_train, y_test) = split_to_train_test (pddata)
# Convert labels to one-hot notation
Y_train = np_utils.to_categorical(y_train, 3)
Y_test  = np_utils.to_categorical(y_test, 3)
# Define model in Keras
def create_model(init):
    model = Sequential()
    model.add(Dense(101, input_shape=(101,), kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(3, kernel_initializer=init, activation='softmax'))
    return model
# Train the model
uniform_model = create_model("glorot_normal")
uniform_model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
uniform_model.fit(X_train, Y_train, batch_size=1, epochs=300, verbose=1, validation_data=(X_test, Y_test)) 
You need to run your network with log_device_placement = True set in the TensorFlow session (the line before the last in the sample code below.) Interestingly enough, if you set that in a session, it will still apply when Keras does the fitting. So this code below (tested) does output the placement for each tensor. Please note, I've short-circuited the data reading because your data wan't available, so I'm just running the network with random data. The code this way is self-contained and runnable by anyone. Another note: if you run this from Jupyter Notebook, the output of the log_device_placement will go to the terminal where Jupyter Notebook was started, not the notebook cell's output.
from tensorflow.contrib.keras.api.keras.models import Sequential
from tensorflow.contrib.keras.api.keras.layers import Dense
from tensorflow.contrib.keras.api.keras.layers import Dropout
from tensorflow.contrib.keras.api.keras import utils as np_utils
import numpy as np
import pandas as pd
import tensorflow as tf
# Read data
#pddata=pd.read_csv('data/data.csv', delimiter=';')
pddata = "foobar"
# Helper function (prepare & test data)
def split_to_train_test (data):
    return (
        np.random.uniform( size = ( 100, 101 ) ),
        np.random.uniform( size = ( 100, 101 ) ),
        np.random.randint( 0, size = ( 100 ), high = 3 ),
        np.random.randint( 0, size = ( 100 ), high = 3 )
    )
    trainLenght = len(data) - len(data)//10
    trainData = data.loc[:trainLenght].sample(frac=1).reset_index(drop=True)
    testData = data.loc[trainLenght+1:].sample(frac=1).reset_index(drop=True)
    trainLabels = trainData.loc[:,"Label"].as_matrix()
    testLabels = testData.loc[:,"Label"].as_matrix()
    trainData = trainData.loc[:,"Feature 0":].as_matrix()
    testData  = testData.loc[:,"Feature 0":].as_matrix()
    return (trainData, testData, trainLabels, testLabels)
# prepare train & test data
(X_train, X_test, y_train, y_test) = split_to_train_test (pddata)
# Convert labels to one-hot notation
Y_train = np_utils.to_categorical(y_train, 3)
Y_test  = np_utils.to_categorical(y_test, 3)
# Define model in Keras
def create_model(init):
    model = Sequential()
    model.add(Dense(101, input_shape=(101,), kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(3, kernel_initializer=init, activation='softmax'))
    return model
# Train the model
uniform_model = create_model("glorot_normal")
uniform_model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
with tf.Session( config = tf.ConfigProto( log_device_placement = True ) ):
    uniform_model.fit(X_train, Y_train, batch_size=1, epochs=300, verbose=1, validation_data=(X_test, Y_test)) 
Terminal output (partial, it was way too long):
...
VarIsInitializedOp_13: (VarIsInitializedOp): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-21 21:54:33.485870: I tensorflow/core/common_runtime/placer.cc:884]
VarIsInitializedOp_13: (VarIsInitializedOp)/job:localhost/replica:0/task:0/device:GPU:0
training/SGD/mul_18/ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-21 21:54:33.485895: I tensorflow/core/common_runtime/placer.cc:884]
training/SGD/mul_18/ReadVariableOp: (ReadVariableOp)/job:localhost/replica:0/task:0/device:GPU:0
training/SGD/Variable_9/Read/ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-21 21:54:33.485903: I tensorflow/core/common_runtime/placer.cc:884]
training/SGD/Variable_9/Read/ReadVariableOp: (ReadVariableOp)/job:localhost/replica:0/task:0/device:GPU:0
...
Note the GPU:0 at the end of many lines.
Tensorflow manual's relevant page: Using GPU: Logging Device Placement.
Put this near the top of your jupyter notebook. Comment out what you don't need.
# confirm TensorFlow sees the GPU
from tensorflow.python.client import device_lib
assert 'GPU' in str(device_lib.list_local_devices())
# confirm Keras sees the GPU (for TensorFlow 1.X + Keras)
from keras import backend
assert len(backend.tensorflow_backend._get_available_gpus()) > 0
# confirm PyTorch sees the GPU
from torch import cuda
assert cuda.is_available()
assert cuda.device_count() > 0
print(cuda.get_device_name(cuda.current_device()))
NOTE: With the release of TensorFlow 2.0, Keras is now included as part of the TF API.
Originally answerwed here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With