I am following along this pytorch tutorial and trying to apply this principle to summarization, where the encoding sequence would be around 1000 words and decoder target 200 words.
How do I apply seq2seq to this? I know it would be very expensive and almost infeasible to run through the whole sequence of 1000 words at once. So dividing the seq into say 20 seq and running in parallel could be an answer. But I'm not sure how to implement it; I also want to incorporate attention into it.
Seq2Seq model with an attention mechanism consists of an encoder, decoder, and attention layer. The decoder decides which part of the source sentence it needs to pay attention to, instead of having encoder encode all the information of the source sentence into a fixed-length vector.
The encoder processes all the inputs by transforming them into a single vector, called context (usually with a length of 256, 512, or 1024). The context contains all the information that the encoder was able to detect from the input (remember that the input is the sentence to be translated in this case).
Attention is a mechanism combined in the RNN allowing it to focus on certain parts of the input sequence when predicting a certain part of the output sequence, enabling easier learning and of higher quality.
Attention is a powerful mechanism developed to enhance the performance of the Encoder-Decoder architecture on neural network-based machine translation tasks.
You can not parallelize RNN in time (1000 here) because they are inherently sequential.
You can use a light RNN, something like QRNN or SRU as a faster alternative(which is still sequential).
Another common sequence processing modules are TCN and Transformers which are both parallelizable in time.
Also, note that all of them can be used with attention and work perfectly fine with text.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With