Why doesn't the transformer use positional encoding in every layer?

Question

Positional encoding is added to the input before it is passed into the transformer model, because otherwise the attention mechanism would be order invariant. However, both the encoder and decoder are layered, with attention being used on each layer. So if order is important for the attention mechanism, shouldn't the positional encoding be added to the input of each multiheaded attention block, instead of just once at the input to the model?

Mohammad Elghandour · Accepted Answer

The transformer uses residual connections, and hence the positional encodings carry over through multiple layers in the encoder and decoder.

Why doesn't the transformer use positional encoding in every layer?

Tags:

artificial-intelligence

machine-learning

transformer-model

Robz

1 Answers

Mohammad Elghandour

Recent Activity

Donate For Us

Why doesn't the transformer use positional encoding in every layer?

Tags:

artificial-intelligence

machine-learning

transformer-model

Robz

1 Answers

Mohammad Elghandour

Related questions

Recent Activity

Donate For Us