For each input I have, I have a 49x2 matrix associated. Here's what 1 input-output couple looks like
input :
[Car1, Car2, Car3 ..., Car118]
output :
[[Label1 Label2]
 [Label1 Label2]
      ...
 [Label1 Label2]]
Where both Label1 and Label2 are LabelEncode and they have respectively 1200 and 1300 different classes.
Just to make sure this is what we call a multi-output multi-class problem?
I tried to flatten the output but I feared the model wouldn't understand that all similar Label share the same classes.
Is there a Keras layer that handle output this peculiar array shape?
Generally, multi-class problems correspond with models outputting a probability distribution over the set of classes (that is typically scored against the one-hot encoding of the actual class through cross-entropy). Now, independently of whether you are structuring it as one single output, two outputs, 49 outputs or 49 x 2 = 98 outputs, that would mean having 1,200 x 49 + 1,300 x 49 = 122,500 output units - which is not something a computer cannot handle, but maybe not the most convenient thing to have. You could try having each class output to be a single (e.g. linear) unit and round it's value to choose the label, but, unless the labels have some numerical meaning (e.g. order, sizes, etc.), that is not likely to work.
If the order of the elements in the input has some meaning (that is, shuffling it would affect the output), I think I'd approach the problem through an RNN, like an LSTM or a bidirectional LSTM model, with two outputs. Use return_sequences=True and TimeDistributed Dense softmax layers for the outputs, and for each 118-long input you'd have 118 pairs of outputs; then you can just use temporal sample weighting to drop, for example, the first 69 (or maybe do something like dropping the 35 first and the 34 last if you're using a bidirectional model) and compute the loss with the remaining 49 pairs of labellings. Or, if that makes sense for your data (maybe it doesn't), you could go with something more advanced like CTC (although Keras does not have it, I'm trying to integrate TensorFlow implementation into it without much sucess), which is also implemented in Keras (thanks @indraforyou)!.
If the order in the input has no meaning but the order of the outputs does, then you could have an RNN where your input is the original 118-long vector plus a pair of labels (each one-hot encoded), and the output is again a pair of labels (again two softmax layers). The idea would be that you get one "row" of the 49x2 output on each frame, and then you feed it back to the network along with the initial input to get the next one; at training time, you would have the input repeated 49 times along with the "previous" label (an empty label for the first one).
If there are no sequential relationships to exploit (i.e. the order of the input and the output do not have a special meaning), then the problem would only be truly represented by the initial 122,500 output units (plus all the hidden units you may need to get those right). You could also try some kind of middle ground between a regular network and a RNN where you have the two softmax outputs and, along with the 118-long vector, you include the "id" of the output that you want (e.g. as a 49-long one-hot encoded vector); if the "meaning" of each label at each of the 49 outputs is similar, or comparable, it may work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With