Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple steps of gradient descent in single tensorflow sess.run()

Tags:

tensorflow

I would like to take multiple steps of gradient descent for a single sess.run() call. The inputs are fixed for each call, thus I should only need to pass them once.

How can I do this? I have an idea, but I'm not sure it recomputes the gradients at each step (and instead applies the first gradient N time). I would like to avoid calling tf.gradients() more than once. Would including the grads_and_vars in the dependencies be sufficient?

N=5
fit_op_i = fit_op_0 = optimizer.apply_gradients(grads_and_vars)
for i in range(N):
    with tf.control_dependencies([fit_op_i]):
        fit_op_i = optimizer.apply_gradients(grads_and_vars)
fit_op_N = fit_op_i

A related question with an answer that requires multiple sess.run() calls: Run train op multiple times in tensorflow

like image 432
eqzx Avatar asked Nov 22 '25 22:11

eqzx


1 Answers

To implement this we can just define a sequence of unique forward-backprop passes with specified dependencies between the operations and then tf.group them together[1] to perform in single session run.

My example defines a perceptron layer for fitting 50 two dimensional Gaussian blobs. The code produces following graph in tensorboard: enter image description here

To test the correctness, I trained two times with the same initialized values. First time using single forward-backprop step, and second time using 3 steps combined as a single operation:

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for i in range(12):
        loss_val = loss_op.eval(feed_dict={x:x_train, y:y_train})
        print(i, '-->', "{0:.3f}".format(loss_val))
        _ = sess.run(train_op, feed_dict={x:x_train, y:y_train})
        # loss_val = loss_op.eval(feed_dict={x:x_train, y:y_train})
        # print(i, '-->', "{0:.3f}".format(loss_val))
        # _ = sess.run(applied_grads, feed_dict={x:x_train, y:y_train})
# 3-steps     # 1-step    
# 0 --> 0.693 # 0 --> 0.693 ---
# 1 --> 0.665 # 1 --> 0.683
# 2 --> 0.638 # 2 --> 0.674
# 3 --> 0.613 # 3 --> 0.665 ---
# 4 --> 0.589 # 4 --> 0.656
# 5 --> 0.567 # 5 --> 0.647
# 6 --> 0.547 # 6 --> 0.638 ---
# 7 --> 0.527 # 7 --> 0.630
# 8 --> 0.509 # 8 --> 0.622
# 9 --> 0.492 # 9 --> 0.613 ---
# ...

It is clearly corresponds to 3-steps. Full Example:

from sklearn.datasets import make_blobs
import tensorflow as tf
import numpy as np
tf.reset_default_graph()

times_to_apply = 3 # number of steps to perform

with tf.name_scope('x'):
    x = tf.placeholder(tf.float32, shape=(None, 2))
with tf.name_scope('y'):
    y = tf.placeholder(tf.int32, shape=(50))

logits = tf.layers.dense(inputs=x,
                         units=2,
                         name='NN',
                         kernel_initializer=tf.initializers.ones,
                         bias_initializer=tf.initializers.zeros)

optimizer = tf.train.GradientDescentOptimizer(0.01)


with tf.name_scope('loss-step-1'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss_op = tf.reduce_mean(xentropy)

with tf.name_scope('optimizer-step-1'):
    grads_and_vars = optimizer.compute_gradients(loss_op)
    applied_grads = optimizer.apply_gradients(grads_and_vars)

all_grads_and_vars = [grads_and_vars]
all_applied_grads = [applied_grads]
all_loss_ops = [loss_op]

for i in range(times_to_apply - 1):
    with tf.control_dependencies([all_applied_grads[-1]]):
        with tf.name_scope('loss-step-' + str(i + 2)):
            xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
            all_loss_ops.append(tf.reduce_mean(xentropy))
    with tf.control_dependencies([all_loss_ops[-1]]):
        with tf.name_scope('optimizer-step-' + str(i + 2)):
           all_grads_and_vars.append(optimizer.compute_gradients(all_loss_ops[-1]))
           all_applied_grads.append(optimizer.apply_gradients(all_grads_and_vars[-1]))

train_op = tf.group(all_applied_grads)

[1] @eqzx is absolutely right. There's no need in grouping the ops together. To achieve the same effect we can execute only the final optimizer step with explicitly defined dependencies.

like image 156
Vlad Avatar answered Nov 27 '25 09:11

Vlad



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!