I'm using Keras with Tensorflow backend and looking at nvidia-smi is not sufficient to understand how much memory current network architecture need because seems like Tensorflow just allocate all availible memory.
So the question is how to find out real GPU memory usage?
If your system has an NVIDIA® GPU and you have the GPU version of TensorFlow installed then your Keras code will automatically run on the GPU.
One way to restrict reserving all GPU RAM in tensorflow is to grow the amount of reservation. This method will allow you to train multiple NN using same GPU but you cannot set a threshold on the amount of memory you want to reserve.
Checking Your GPU Availability With Keras The easiest way to check if you have access to GPUs is to call tf. config. experimental. list_physical_devices('GPU').
Limiting GPU memory growth To limit TensorFlow to a specific set of GPUs, use the tf. config. set_visible_devices method. In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.
It can be done using Timeline, which can give you a full trace about memory logging. Similar to the code below:
from keras import backend as K
from tensorflow.python.client import timeline
import tensorflow as tf
with K.get_session()  as s:
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    run_metadata = tf.RunMetadata()
     
    # your fitting code and s run with run_options 
    to = timeline.Timeline(run_metadata.step_stats)
    trace = to.generate_chrome_trace_format()
    with open('full_trace.json', 'w') as out:
            out.write(trace)
If you want to limit the gpu memory usage, it can alse be done from gpu_options. Like the following code:
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
set_session(tf.Session(config=config))
Check the following documentation about the Timeline object
As you use TensorFlow in the backend, you can use tfprof profiling tool
You can still use nvidia-smi after telling TensorFlow not to reserve all memory of the GPU, but to grow this reservation on demand:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
keras.backend.tensorflow_backend.set_session(tf.Session(config=config))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With