The question How to initialize weights in PyTorch? shows how to initialize the weights in Pytorch.  However, what is the default weight initializer for Convand Dense in Pytorch? What distribution does Pytorch use?
Each pytorch layer implements the method reset_parameters which is called at the end of the layer initialization to initialize the weights.
You can find the implementation of the layers here.
For the dense layer which in pytorch is called linear for example, weights are initialized uniformly
stdv = 1. / math.sqrt(self.weight.size(1))
self.weight.data.uniform_(-stdv, stdv)
where self.weight.size(1) is the number of inputs. This is done to keep the variance of the distributions of each layer relatively similar at the beginning of training by normalizing it to one. You can read a more detailed explanation here.
For the convolutional layer the initialization is basically the same. You just compute the number of inputs by multiplying the number of channels with the kernel size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With