Layer normalization and how it works (tensorflow)

Question

I have a hard time understanding layer normalization. Let's say I trained a model in tensorflow. When I check the variables of the graph, I get the following:

     <tf.Variable 'actor/dense/kernel:0' shape=(5, 64) dtype=float32_ref>,
     <tf.Variable 'actor/dense/bias:0' shape=(64,) dtype=float32_ref>,
     <tf.Variable 'actor/LayerNorm/beta:0' shape=(64,) dtype=float32_ref>,
     <tf.Variable 'actor/LayerNorm/gamma:0' shape=(64,) type=float32_ref>,
     <tf.Variable 'actor/dense_1/kernel:0' shape=(64, 64) dtype=float32_ref>,
     <tf.Variable 'actor/dense_1/bias:0' shape=(64,) dtype=float32_ref>,
     <tf.Variable 'actor/LayerNorm_1/beta:0' shape=(64,) dtype=float32_ref>,
     <tf.Variable 'actor/LayerNorm_1/gamma:0' shape=(64,) dtype=float32_ref>
     <tf.Variable 'actor/dense_2/kernel:0' shape=(64, 1) dtype=float32_ref>,
     <tf.Variable 'actor/dense_2/bias:0' shape=(1,) dtype=float32_ref>

As you see it is a two-layer fully-connected network with layer normalization in each layer.

So, I know that the biases are added to the node inputs. Do the variables actor/LayerNorm/beta:0, actor/LayerNorm/gamma:0 etc. work the same way? Can I just summarize the biases, beta and gamma values for one layer as one "bias" vector? Or is it a complete different mechanism?

Thank you in advance!

Aaron · Accepted Answer

The beta and gamma variables are different than the bias variables. The code is something like this:

y = tf.matmul(kernel, x) + bias
mean, variance = tf.nn.moments(y, [1], keep_dims=True)
normalized_y = (y - mean) / tf.sqrt(variance + 1e-5)
y_out = normalized_y * gamma + beta

First you multiply the kernel with the input x and add the bias term. Then you compute the mean and variance of the values in the vector y. You normalize y by subtracting the mean value and dividing by the total standard deviation. Finally, you adjust y by multiplying each dimension with gamma and adding beta.

Aziz · Answer

I would suggest instead to use the implementation provided by the contributors in the tensorflow official repo. Here is the layer normalization implementation

Layer normalization and how it works (tensorflow)

Tags:

python-3.x

tensorflow

Mikhail Dem

2 Answers

Aaron

Aziz

Recent Activity

Donate For Us

Layer normalization and how it works (tensorflow)

Tags:

python-3.x

tensorflow

Mikhail Dem

2 Answers

Aaron

Aziz

Related questions

Recent Activity

Donate For Us