Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Layer normalization and how it works (tensorflow)

I have a hard time understanding layer normalization. Let's say I trained a model in tensorflow. When I check the variables of the graph, I get the following:

     <tf.Variable 'actor/dense/kernel:0' shape=(5, 64) dtype=float32_ref>,
     <tf.Variable 'actor/dense/bias:0' shape=(64,) dtype=float32_ref>,
     <tf.Variable 'actor/LayerNorm/beta:0' shape=(64,) dtype=float32_ref>,
     <tf.Variable 'actor/LayerNorm/gamma:0' shape=(64,) type=float32_ref>,
     <tf.Variable 'actor/dense_1/kernel:0' shape=(64, 64) dtype=float32_ref>,
     <tf.Variable 'actor/dense_1/bias:0' shape=(64,) dtype=float32_ref>,
     <tf.Variable 'actor/LayerNorm_1/beta:0' shape=(64,) dtype=float32_ref>,
     <tf.Variable 'actor/LayerNorm_1/gamma:0' shape=(64,) dtype=float32_ref>
     <tf.Variable 'actor/dense_2/kernel:0' shape=(64, 1) dtype=float32_ref>,
     <tf.Variable 'actor/dense_2/bias:0' shape=(1,) dtype=float32_ref>

As you see it is a two-layer fully-connected network with layer normalization in each layer.

So, I know that the biases are added to the node inputs. Do the variables actor/LayerNorm/beta:0, actor/LayerNorm/gamma:0 etc. work the same way? Can I just summarize the biases, beta and gamma values for one layer as one "bias" vector? Or is it a complete different mechanism?

Thank you in advance!

like image 989
Mikhail Dem Avatar asked Mar 23 '26 13:03

Mikhail Dem


2 Answers

The beta and gamma variables are different than the bias variables. The code is something like this:

y = tf.matmul(kernel, x) + bias
mean, variance = tf.nn.moments(y, [1], keep_dims=True)
normalized_y = (y - mean) / tf.sqrt(variance + 1e-5)
y_out = normalized_y * gamma + beta

First you multiply the kernel with the input x and add the bias term. Then you compute the mean and variance of the values in the vector y. You normalize y by subtracting the mean value and dividing by the total standard deviation. Finally, you adjust y by multiplying each dimension with gamma and adding beta.

like image 123
Aaron Avatar answered Mar 26 '26 05:03

Aaron


I would suggest instead to use the implementation provided by the contributors in the tensorflow official repo. Here is the layer normalization implementation

like image 26
Aziz Avatar answered Mar 26 '26 05:03

Aziz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!