I am trying to do validation after every epoch during training.
I am creating the graphs as follows :
import tensorflow as tf
from networks import densenet
from networks.densenet_utils import dense_arg_scope
with tf.variable_scope('scope') as scope:
with slim.arg_scope(dense_arg_scope()):
logits_train, _ = densenet(images, blocks=networks[
'densenet_265'], num_classes=1000, data_name='imagenet', is_training=True, scope='densenet265',
reuse=tf.AUTO_REUSE)
scope.reuse_variables()
with slim.arg_scope(dense_arg_scope()):
logits_val, _ = densenet(images, blocks=networks[
'densenet_265'], num_classes=1000, data_name='imagenet', is_training=False, scope='densenet265',
reuse=tf.AUTO_REUSE)
And for getting the logits during training or validation, I do the following :
is_training = tf.Variable(True, trainable=False, dtype=tf.bool)
training_mode = tf.assign(is_training, True)
validation_mode = tf.assign(is_training, False)
logits = tf.cond(tf.equal(is_training, tf.constant(True, dtype=tf.bool)), lambda: logits_train,
lambda: logits_val)
However when I run my code, I get OOM error. I am sure that this is not because of a large batch size. This is because, previously I was making a blunder and was using the same graph during training and validation. At that time with a batch size of 32 with image size 224x224x3, the code was running quite fine.
I am suspecting that I am making some mistake in trying to reuse the graph during validation with is_training=False.
The code for densenet has been taken from the following two files : densenet_utils.py densenet.py
You're creating two separate networks in logits_train and logits_val, so this takes up double the memory your network would take otherwise. (I'm assuming it's set up properly and variables are shared correctly, that could be another issue, but that wouldn't probably cause OOM, the large data are the activations, not the weights.)
There is no need for doing this. Use the same network logits_train for validation as well. Turns out the parameter is_training can take a boolean scalar tensor as well, so you can switch training or inference mode on the fly.
So right where you set up your images placeholder, have this line as next:
training_mode = tf.placeholder( shape = None, dtype = tf.bool )
Then in the above code, set up your network like this:
logits_train, _ = densenet(images, blocks=networks['densenet_265'],
num_classes=1000, data_name='imagenet', is_training=training_mode,
scope='densenet265', reuse=tf.AUTO_REUSE)
Note that the is_training argument's value is populated with the tensor training_mode above!
And then when you do the sess.run( [ ... ] ) command (not visible in your code above) you should include the training_mode in your feed_dict like so (pseudo code):
result = sess.run( [ ??? ], feed_dict = { images : ???, training_mode : True / False } )
Note that the training_mode tensor is now populated with False or True based on whether you're doing training.
This is based on my research of the batch_normalization and dropout layers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With