The documentation for the conv2d_transpose() operation does not clearly explain what it does:
The transpose of conv2d.
This operation is sometimes called "deconvolution" after Deconvolutional Networks, but is actually the transpose (gradient) of conv2d rather than an actual deconvolution.
I went through the paper that the doc points to, but it did not help.
What does this operation do and what are examples of why you would want to use it?
Applies a 2D transposed convolution operator over an input image composed of several input planes. This module can be seen as the gradient of Conv2d with respect to its input.
Conv2D is mainly used when you want to detect features, e.g., in the encoder part of an autoencoder model, and it may shrink your input shape. Conversely, Conv2DTranspose is used for creating features, for example, in the decoder part of an autoencoder model for constructing an image.
Again, assuming square shaped tensors, the formula for transposed convolution is: Let's try this with Example 7, where the input size = 3, stride = 2, padding = 1, kernel size = 2. The calculation is then simply 2*2 - 2 + 1 + 1 = 4, so the output is of size 4.
This is the best explanation I've seen online how convolution transpose works is here.
I'll give my own short description. It applies convolution with a fractional stride. In other words spacing out the input values (with zeroes) to apply the filter over a region that's potentially smaller than the filter size.
As for the why one would want to use it. It can be used as a sort of upsampling with learned weights as opposed to bilinear interpolation or some other fixed form of upsampling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With