Multiple steps of gradient descent in single tensorflow sess.run()

Question

I would like to take multiple steps of gradient descent for a single sess.run() call. The inputs are fixed for each call, thus I should only need to pass them once.

How can I do this? I have an idea, but I'm not sure it recomputes the gradients at each step (and instead applies the first gradient N time). I would like to avoid calling tf.gradients() more than once. Would including the grads_and_vars in the dependencies be sufficient?

N=5
fit_op_i = fit_op_0 = optimizer.apply_gradients(grads_and_vars)
for i in range(N):
    with tf.control_dependencies([fit_op_i]):
        fit_op_i = optimizer.apply_gradients(grads_and_vars)
fit_op_N = fit_op_i

A related question with an answer that requires multiple sess.run() calls: Run train op multiple times in tensorflow

Vlad · Accepted Answer

To implement this we can just define a sequence of unique forward-backprop passes with specified dependencies between the operations and then tf.group them together^[1] to perform in single session run.

My example defines a perceptron layer for fitting 50 two dimensional Gaussian blobs. The code produces following graph in tensorboard: enter image description here

To test the correctness, I trained two times with the same initialized values. First time using single forward-backprop step, and second time using 3 steps combined as a single operation:

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for i in range(12):
        loss_val = loss_op.eval(feed_dict={x:x_train, y:y_train})
        print(i, '-->', "{0:.3f}".format(loss_val))
        _ = sess.run(train_op, feed_dict={x:x_train, y:y_train})
        # loss_val = loss_op.eval(feed_dict={x:x_train, y:y_train})
        # print(i, '-->', "{0:.3f}".format(loss_val))
        # _ = sess.run(applied_grads, feed_dict={x:x_train, y:y_train})
# 3-steps     # 1-step    
# 0 --> 0.693 # 0 --> 0.693 ---
# 1 --> 0.665 # 1 --> 0.683
# 2 --> 0.638 # 2 --> 0.674
# 3 --> 0.613 # 3 --> 0.665 ---
# 4 --> 0.589 # 4 --> 0.656
# 5 --> 0.567 # 5 --> 0.647
# 6 --> 0.547 # 6 --> 0.638 ---
# 7 --> 0.527 # 7 --> 0.630
# 8 --> 0.509 # 8 --> 0.622
# 9 --> 0.492 # 9 --> 0.613 ---
# ...

It is clearly corresponds to 3-steps. Full Example:

from sklearn.datasets import make_blobs
import tensorflow as tf
import numpy as np
tf.reset_default_graph()

times_to_apply = 3 # number of steps to perform

with tf.name_scope('x'):
    x = tf.placeholder(tf.float32, shape=(None, 2))
with tf.name_scope('y'):
    y = tf.placeholder(tf.int32, shape=(50))

logits = tf.layers.dense(inputs=x,
                         units=2,
                         name='NN',
                         kernel_initializer=tf.initializers.ones,
                         bias_initializer=tf.initializers.zeros)

optimizer = tf.train.GradientDescentOptimizer(0.01)


with tf.name_scope('loss-step-1'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss_op = tf.reduce_mean(xentropy)

with tf.name_scope('optimizer-step-1'):
    grads_and_vars = optimizer.compute_gradients(loss_op)
    applied_grads = optimizer.apply_gradients(grads_and_vars)

all_grads_and_vars = [grads_and_vars]
all_applied_grads = [applied_grads]
all_loss_ops = [loss_op]

for i in range(times_to_apply - 1):
    with tf.control_dependencies([all_applied_grads[-1]]):
        with tf.name_scope('loss-step-' + str(i + 2)):
            xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
            all_loss_ops.append(tf.reduce_mean(xentropy))
    with tf.control_dependencies([all_loss_ops[-1]]):
        with tf.name_scope('optimizer-step-' + str(i + 2)):
           all_grads_and_vars.append(optimizer.compute_gradients(all_loss_ops[-1]))
           all_applied_grads.append(optimizer.apply_gradients(all_grads_and_vars[-1]))

train_op = tf.group(all_applied_grads)

_{[1] @eqzx is absolutely right. There's no need in grouping the ops together. To achieve the same effect we can execute only the final optimizer step with explicitly defined dependencies.}

Multiple steps of gradient descent in single tensorflow sess.run()

Tags:

tensorflow

eqzx

1 Answers

Vlad

Recent Activity

Donate For Us

Multiple steps of gradient descent in single tensorflow sess.run()

Tags:

tensorflow

eqzx

1 Answers

Vlad

Related questions

Recent Activity

Donate For Us