Keras has many different ways of merging inputs like Add()
, Subtract()
, Multiply()
, concatenate()
, etc...
Do they all have the same effect or are there situations where one is preferable?
It really depends on what you are trying to achieve but briefly let's look at different merge layers and what they are often used for:
- add Addition is a common merge operation for networks that use relu activation function as the sum will also be positive and can encode an OR operation. For example, you want to identify if any of the comments for this answer are positive using a deep network, then you can add all the encoded representations.
- subtract Subtraction in conjunction with squaring so (x-y)^2 is used for equality relationships, how close is something to another. These pop up in attention calculations, does this region in the image contain features I'm looking for could be a subtraction.
- multiply Similar to subtraction, if you have features say from a tanh, then you can multiply them element wise to find similar features. If both are positive or negative the multiplication will be positive negative otherwise, the networks use this information well.
- average is often traded for concatenate not to lose information but if the problem mathematically makes sense to have each previous branch of computation to have equal weight then you average. For example, you might want to find the overall sentiment of a paragraph and don't want any single negative sentence to affect a neutral paragraph.
- maximum pop up in pooling operations like MaxPooling and allow you to achieve some invariance over the dimension. In images it doesn't matter where the cat is to classify for cat, similarly it doesn't matter where the anomaly occurs if you just want to detect it.
- concatenate is the most common because it lets the upstream network to decide how to use the given information. It is used to gather information and often outputs of other merge layers. So you can compute multiplication and merge with its inputs
[x, y, x*y]
. By default it is used in Bidirectional to let information from both directions.