Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't the transformer use positional encoding in every layer?

Positional encoding is added to the input before it is passed into the transformer model, because otherwise the attention mechanism would be order invariant. However, both the encoder and decoder are layered, with attention being used on each layer. So if order is important for the attention mechanism, shouldn't the positional encoding be added to the input of each multiheaded attention block, instead of just once at the input to the model?

like image 853
Robz Avatar asked Dec 02 '25 09:12

Robz


1 Answers

The transformer uses residual connections, and hence the positional encodings carry over through multiple layers in the encoder and decoder.

like image 151
Mohammad Elghandour Avatar answered Dec 04 '25 11:12

Mohammad Elghandour



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!