If you read the Keras documentation entry for Dense, you will see that this call:
Dense(16, input_shape=(5,3))
would result in a Dense network with 3 inputs and 16 outputs which would be applied independently for each of 5 steps. So, if D(x) transforms 3 dimensional vector to 16-d vector, what you'll get as output from your layer would be a sequence of vectors: [D(x[0,:]), D(x[1,:]),..., D(x[4,:])] with shape (5, 16). In order to have the behavior you specify you may first Flatten your input to a 15-d vector and then apply Dense:
model = Sequential()
model.add(Flatten(input_shape=(3, 2)))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(4))
model.compile(loss='mean_squared_error', optimizer='SGD')
EDIT: As some people struggled to understand - here you have an explaining image:

 This is how Flatten works converting Matrix to single array.
This is how Flatten works converting Matrix to single array.
short read:
Flattening a tensor means to remove all of the dimensions except for one. This is exactly what the Flatten layer does.
long read:
If we take the original model (with the Flatten layer) created in consideration we can get the following model summary:
Layer (type)                 Output Shape              Param #   
=================================================================
D16 (Dense)                  (None, 3, 16)             48        
_________________________________________________________________
A (Activation)               (None, 3, 16)             0         
_________________________________________________________________
F (Flatten)                  (None, 48)                0         
_________________________________________________________________
D4 (Dense)                   (None, 4)                 196       
=================================================================
Total params: 244
Trainable params: 244
Non-trainable params: 0
For this summary the next image will hopefully provide little more sense on the input and output sizes for each layer.
The output shape for the Flatten layer as you can read is (None, 48). Here is the tip. You should read it (1, 48) or (2, 48) or ... or (16, 48) ... or (32, 48), ...
In fact, None on that position means any batch size. For the inputs to recall, the first dimension means the batch size and the second means the number of input features.
The role of the Flatten layer in Keras is super simple:
A flatten operation on a tensor reshapes the tensor to have the shape that is equal to the number of elements contained in tensor non including the batch dimension.

Note: I used the model.summary() method to provide the output shape and parameter details.
It is rule of thumb that the first layer in your network should be the same shape as your data. For example our data is 28x28 images, and 28 layers of 28 neurons would be infeasible, so it makes more sense to 'flatten' that 28,28 into a 784x1. Instead of wriitng all the code to handle that ourselves, we add the Flatten() layer at the begining, and when the arrays are loaded into the model later, they'll automatically be flattened for us.
I came across this recently, it certainly helped me understand: https://www.cs.ryerson.ca/~aharley/vis/conv/
So there's an input, a Conv2D, MaxPooling2D etc, the Flatten layers are at the end and show exactly how they are formed and how they go on to define the final classifications (0-9).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With