Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we do padding in NLP tasks?

In NLP tasks, it's very common that people annotate a sentence with SOC (start of a sentence) and EOC(end of a sentence). Why do they do that?

Is it a task dependent performance? For instance, the reason you do padding in NER problems is different from the reason you do padding for translation problems? As in the NER problem you do padding as to extract more useful features from the context, however in a translation problem, you do padding to identify the end of a sentence because the decoder is trained sentence-by-sentence.

like image 202
GabrielChu Avatar asked Oct 20 '25 15:10

GabrielChu


1 Answers

Why is there End of Output padding in NLP?

Let's say we want to use a RNN (recurrent neural net) to complete a sentence for us. Let's give it the sentence "If at first you don't succeed,". We'd like it to output "try try again" and then know to stop. It's the stop that's important. If we just use a period then we cannot use the same RNN to output a multi sentence response.

If we are using the RNN instead to respond to a question, then perhaps the answer has multiple sentences.

Why is there Start of Output padding in NLP?

Let's say we train an RNN on poetry and we want it to make original poetry in the style of how we trained it. We will have to give it the first token to start the poetry with. We could give it the first word, ... or we could just say start. If we train the RNN to always start from a unique token (like a start of output token) then the RNN can chose the first word to use.

Summary

The start and end of something of a thing is so intuitive to us that I think it's easy to forget that at one point we had to learn when enough is enough (end token) and when or how to start (start token), but both of these things the RNN has to learn.

like image 68
Anton Codes Avatar answered Oct 25 '25 01:10

Anton Codes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!