I am trying to grasp what TimeDistributed wrapper does in Keras.
I get that TimeDistributed "applies a layer to every temporal slice of an input."
But I did some experiment and got the results that I cannot understand.
In short, in connection to LSTM layer, TimeDistributed and just Dense layer bear same results.
model = Sequential()
model.add(LSTM(5, input_shape = (10, 20), return_sequences = True))
model.add(TimeDistributed(Dense(1)))
print(model.output_shape)
model = Sequential()
model.add(LSTM(5, input_shape = (10, 20), return_sequences = True))
model.add((Dense(1)))
print(model.output_shape)
For both models, I got output shape of (None, 10, 1).
Can anyone explain the difference between TimeDistributed and Dense layer after an RNN layer?
TimeDistributed layer is very useful to work with time series data or video frames. It allows to use a layer for each input. That means that instead of having several input “models”, we can use “one model” applied to each input. Then GRU or LSTM can help to manage the data in “time”.
Here comes our Savior the Time Distributed Layer from Tensorflow. This Specialized Layer applies the same layer to several inputs and get output for each input such that we can combine them and pass it to another layer to make predictions.
What is a Dense Layer? In any neural network, a dense layer is a layer that is deeply connected with its preceding layer which means the neurons of the layer are connected to every neuron of its preceding layer. This layer is the most commonly used layer in artificial neural network networks.
In keras - while building a sequential model - usually the second dimension (one after sample dimension) - is related to a time dimension. This means that if for example, your data is 5-dim with (sample, time, width, length, channel) you could apply a convolutional layer using TimeDistributed (which is applicable to 4-dim with (sample, width, length, channel)) along a time dimension (applying the same layer to each time slice) in order to obtain 5-d output.
The case with Dense is that in keras from version 2.0 Dense is by default applied to only last dimension (e.g. if you apply Dense(10) to input with shape (n, m, o, p) you'll get output with shape (n, m, o, 10)) so in your case Dense and TimeDistributed(Dense) are equivalent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With