Backpropagation for Max-Pooling Layers: Multiple Maximum Values

Question

I am currently implementing a CNN in plain numpy and have a brief question regarding a special case of the backpropagation for a max-pool layer:

While it is clear that the gradient with respect to non-maximum values vanishes, I am not sure about the case where several entries of a slice are equal to the maximum value. Strictly speaking, the function should not be differentiable at this "point". However, I would assume that one can pick a subgradient from the corresponding subdifferential (similar to choosing the subgradient "0" for the Relu function at x=0).

Hence, I am wondering if it would be sufficient to simply form the gradient with respect to one of the maximum values and treat the remaining maxium values as non-maximum values.

If that is the case, would it be advisable to randomize the selection of the maximum value to avoid bias or is it okay to always pick the first maximum value?

Yubin Hu · Accepted Answer

This is a great question but most people never worry about it because it almost never happens. Assuming you randomly initialized your parameters, your image is not artificially generated, and you are using float32, then the probability of having two equal max's is around N * 2^-32 (N * 0.0000000002), where N is the number of inputs to the max pool layer.

Therefore, unless you have close to 1 billion inputs, any implementation you want should have similar effects.

Backpropagation for Max-Pooling Layers: Multiple Maximum Values

Tags:

python

backpropagation

deep-learning

conv-neural-network

max-pooling

x3t2h

1 Answers

Yubin Hu

Recent Activity

Donate For Us

Backpropagation for Max-Pooling Layers: Multiple Maximum Values

Tags:

python

backpropagation

deep-learning

conv-neural-network

max-pooling

x3t2h

1 Answers

Yubin Hu

Related questions

Recent Activity

Donate For Us