I am trying to create an image captioning model. Could you please help with this error? input1 is the image vector, input2 is the caption sequence. 32 is the caption length. I want to concatenate the image vector with the embedding of the sequence and then feed it to the decoder model.
def define_model(vocab_size, max_length):
input1 = Input(shape=(512,))
input1 = tf.keras.layers.RepeatVector(32)(input1)
print(input1.shape)
input2 = Input(shape=(max_length,))
e1 = Embedding(vocab_size, 512, mask_zero=True)(input2)
print(e1.shape)
dec1 = tf.concat([input1,e1], axis=2)
print(dec1.shape)
dec2 = LSTM(512)(dec1)
dec3 = LSTM(256)(dec2)
dec4 = Dropout(0.2)(dec3)
dec5 = Dense(256, activation="relu")(dec4)
output = Dense(vocab_size, activation="softmax")(dec5)
model = tf.keras.Model(inputs=[input1, input2], outputs=output)
model.compile(loss="categorical_crossentropy", optimizer="adam")
print(model.summary())
return model
ValueError: Input 0 of layer lstm_5 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 512]
This error occurs when an LSTM layer gets input in 2D instead of 3D. For instance:
(64, 100)
The correct format is (n_samples, time_steps, features):
(64, 5, 100)
In this case, the mistake you did was that the input of dec3, which is an LSTM layer, was the output of dec2, which is also an LSTM layer. By default, the argument return_sequences in an LSTM layer is False. This means that the first LSTM returned a 2D tensor, which was incompatible with the next LSTM layer. I solved your issue by setting return_sequences=True in your first LSTM layer.
Also, there was an error in this line:
model = tf.keras.Model(inputs=[input1, input2], outputs=output)
input1 was not an input layer because you reassigned it. See:
input1 = Input(shape=(512,))
input1 = tf.keras.layers.RepeatVector(32)(input1)
I renamed the second one e0, consistent with how you're naming your variables.
Now, everything is working:
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras import Input
vocab_size, max_length = 1000, 32
input1 = Input(shape=(128))
e0 = tf.keras.layers.RepeatVector(32)(input1)
print(input1.shape)
input2 = Input(shape=(max_length,))
e1 = Embedding(vocab_size, 128, mask_zero=True)(input2)
print(e1.shape)
dec1 = Concatenate()([e0, e1])
print(dec1.shape)
dec2 = LSTM(16, return_sequences=True)(dec1)
dec3 = LSTM(16)(dec2)
dec4 = Dropout(0.2)(dec3)
dec5 = Dense(32, activation="relu")(dec4)
output = Dense(vocab_size, activation="softmax")(dec5)
model = tf.keras.Model(inputs=[input1, input2], outputs=output)
model.compile(loss="categorical_crossentropy", optimizer="adam")
print(model.summary())
Model: "model_2"
_________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
=================================================================================
input_24 (InputLayer) [(None, 128)] 0
_________________________________________________________________________________
input_25 (InputLayer) [(None, 32)] 0
_________________________________________________________________________________
repeat_vector_12 (RepeatVector) (None, 32, 128) 0 input_24[0][0]
_________________________________________________________________________________
embedding_11 (Embedding) (None, 32, 128) 128000 input_25[0][0]
_________________________________________________________________________________
concatenate_7 (Concatenate) (None, 32, 256) 0 repeat_vector_12[0][0]
embedding_11[0][0]
_________________________________________________________________________________
lstm_12 (LSTM) (None, 32, 16) 17472 concatenate_7[0][0]
_________________________________________________________________________________
lstm_13 (LSTM) (None, 16) 2112 lstm_12[0][0]
_________________________________________________________________________________
dropout_2 (Dropout) (None, 16) 0 lstm_13[0][0]
_________________________________________________________________________________
dense_4 (Dense) (None, 32) 544 dropout_2[0][0]
_________________________________________________________________________________
dense_5 (Dense) (None, 1000) 33000 dense_4[0][0]
=================================================================================
Total params: 181,128
Trainable params: 181,128
Non-trainable params: 0
_________________________________________________________________________________
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With