I'm having a hard time wrapping my head around the math behind CNN's and how exactly I should modify the output shape in between layers of my neural network.
I am trying to do the carvana image masking challenge on kaggle https://www.kaggle.com/c/carvana-image-masking-challenge . So in other words, I'm trying to create a neural network that, given a picture of a car, can identify the boundaries of the car within that image and crop it out from the rest of the background.
So my inputs are all images, width=959px and height=640px. The shape of my input array is (159, 640, 959, 3) where 159 represents the fact that the input array holds 159 images in total. The targets I created are matrices with 640 rows and 959 columns (an entry for each pixel), using booleans to represent whether or not the corresponding pixel is a car/within the boundaries of a car. The shape of the target data is (159, 640, 959) where 159 likely represents the fact that the target holds 159 images
I created a prematurely structured convolutional network (by that I just mean, there's very few filters used). The code for the architecture is here.
nn = Sequential()
nn.add(Conv2D(8,(3,3), input_shape = (IMG_HEIGHT, IMG_WIDTH, 3), activation = 'relu', padding = 'same'))
nn.add(Conv2D(8, (3,3), activation='relu', padding='same'))
nn.add(Dense(1, activation='softmax'))
Summary() shows the following:
# Summary:
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 640, 959, 8) 224
_________________________________________________________________
conv2d_2 (Conv2D) (None, 640, 959, 8) 584
_________________________________________________________________
dense_1 (Dense) (None, 640, 959, 6) 54
=================================================================
Total params: 862
Trainable params: 862
And the error I've been stuck with is just...
ValueError: Error when checking target: expected dense_1 to have 4 dimensions, but got array with shape (159, 640, 959)
at the moment I'm actually not sure how I would modify this code to make it work and get past this error. I'm confused about how the last layer is supposed to have 4 dimensions. It seems according to Keras's summary this output actually does have 4 dimensions, but one of those dimensions is marked as None. If the output isn't supposed to have a shape of (640, 959), just like each target image... I don't really know what the shape of the output is supposed to be. I'm just having a hard time putting what I have previously learned about regarding convolutional networks into actual code. I can't get past this error and I am currently struggling to figure out how. There's something fundamental that I'm not doing correctly...
edit: originally said the images had a shape of 440px X 959px. This is incorrect, it's actually 640px X 959px. Really inconvenient typo on my part.
The documentation on Dense is not the clearest, but it is clear from the section describing input and output shapes.
Note: if the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with
kernel....
Input shape
nD tensor with shape:
(batch_size, ..., input_dim). The most common situation would be a 2D input with shape(batch_size, input_dim).Output shape
nD tensor with shape:
(batch_size, ..., units). For instance, for a 2D input with shape(batch_size, input_dim), the output would have shape(batch_size, units).
This is very confusing because it talks about how higher rank tensors will be flattened first (which makes you think the overall output of Dense(1) would be a purely scalar value for each example from a batch), but as you demonstrated with your printout from summary(), it maintains the same intermediate dimensions of the tensor.
So if you give an input that is (None, 640, 959, 8), it means Dense will treat the final dimension as the one to go along for full connections, and will treat each individual unit among the 640x959 locations specified by the inner dimensions as a separate output neuron...
So if your network is this:
nn = Sequential()
nn.add(Conv2D(8, (3,3), input_shape = (640, 959, 3), activation='relu', padding='same'))
nn.add(Conv2D(8, (3,3), activation='relu', padding='same'))
nn.add(Dense(1, activation='softmax'))
then the final output shape will be
(None, 640, 959, 1)
That is, each output "pixel" (i, j) in the 640x959 grid is calculated as a dense combination of the 8 different convolution channels at point (i, j) from the previous layer.
There are various ways to achieve the same thing, for example a 1x1 convolution that downsamples the channel dimension from 8 to 1 would produce the same output shape as well, with a layer like,
Conv2D(1, (1,1), activation='relu', padding='same')
or you could reference the "naive Keras" example for the particular Kaggle competition you're working on, which uses this:
model = Sequential()
model.add( Conv2D(16, 3, activation='relu', padding='same', input_shape=(320, 480, 12) ) )
model.add( Conv2D(32, 3, activation='relu', padding='same') )
model.add( Conv2D(1, 5, activation='sigmoid', padding='same') )
Separately from all of this we have two problems of incorrect data dimensions from the code you've printed for us.
One is that you state the image height is 440, but the keras output says 640.
The other is that your final Dense layer has 6 channels in the output, but the corresponding code you provided could only lead to 1 channel.
So likely there is still some mismatch between the code you're using and the code you've pasted here, which prevents us from seeing the full problem with the dimension issues.
For example, the loss layer for this network ought to compare the ground truth bitmasks of car location pixels with the 640x959 Dense output of your final layer (once you fix the weird issue where you're showing 6 channels in the output).
But the error message you reported is
ValueError: Error when checking target: expected dense_1 to have 4 dimensions, but got array with shape (159, 640, 959)
and this suggests the batch of target data might need to be reshaped into a tensor of shape (159, 640, 959, 1), just for the sake of conformability with the shape that comes out of your Dense layer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With