Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why sigmoid will make gradient all positive or negative

In the course cs231n, when I go through the note about activation function, I meet a problem about sigmoid function. Here is the screen shot:

cons and pros of sigmoid

In my opinion, since the gradient dw = x.T dot dout, although now x.T is all positive, but after matrix multiplication, why dw will be all positive or negative? the only chance of that is that dout is all positive or negative, but why is that?

Can someone help me?

like image 249
Yuhang CAO Avatar asked Dec 06 '25 05:12

Yuhang CAO


1 Answers

If you read the exact sentence, in its entirety it says (slightly paraphrased):

If the data coming into a neuron is always positive then the gradient on the weights during backpropagation become either all positive or all negative (depending on the gradient of the whole expression f).

Assume f = w^Tx + b. Then the gradients with respect to the weights is \nabla_w L = (dL/df)(df/dw). Since dL/df is a scalar, it is either positive or negative (or zero, but that is unlikely). On the other hand, df/dw = x. So clearly if x is all positive or all negative, then df/dw is also all positive or all negative. But this means that \nabla_w L must also be all positive or all negative, because dL/df cannot change the signs of individual elements of df/dw differently. Thus the sign of the gradient is homogeneous.

like image 153
user3658307 Avatar answered Dec 10 '25 03:12

user3658307



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!