How to use fairseq interactive.py non-interactively?

Question

I am trying to translate from English to Arabic using Fairseq. But the interactive.py script translate pieces of text fragment on-the-fly. But I need to use it as reading an input text file and writing output text file write. I referred this GitHub issue - https://github.com/pytorch/fairseq/issues/858 But it doesn't clearly explain on how to do it in general. Any suggestions ?

Xavier · Accepted Answer

fairseq-interactive can read lines from a file with the --input parameter, and it outputs translations to standard output.

So let's say I have this input text file source.txt (where every sentence to translate is on a separate line):

Hello world!
My name is John

You can run:

fairseq-interactive --input=source.txt [all-your-fairseq-parameters] > target.txt

Where > target.txt means "put in the target.txt file all (standard) output generated by fairseq-interactive". The file will be created if it doesn't exist yet.

With an English to French model it would generate a file target.txt that looks something like this (actual output may vary depending on your model, configuration and Fairseq version):

S-0     Hello world!
W-0     0.080   seconds
H-0     -0.43813419342041016    Bonj@@ our le monde !
D-0     -0.43813419342041016    Bonjour le monde !
P-0     -0.1532 -1.7157 -0.0805 -0.0838 -0.1575
S-1     My name is John
W-1     0.080   seconds
H-1     -0.3272092938423157     Je m' appelle John .
D-1     -0.3272092938423157     Je m'appelle John.
P-2     -0.3580 -0.2207 -0.0398 -0.1649 -1.0216 -0.1583

To keep only the translations (lines starting with D-), you would have to filter the content of this file. You could use this command for example:

grep -P "D-[0-9]+" target.txt | cut -f3 > only_translations.txt

but you can merge all commands in one line:

fairseq-interactive --input=source.txt [all-your-fairseq-parameters] | grep -P "D-[0-9]+" | cut -f3 > target.txt

(Actual command will depend on the actual structure of target.txt.)

Finally, know that you can use --input=- to read input from standard input.

xihajun · Answer

I found that fairseq-interactive is a bit slow. I think there is another potential solution if you just want input and output files using the fairseq pretrained model. (but not sure if it will be faster)

Basically, you can load the model in python and use model.translate

from fairseq.models.transformer import TransformerModel
trans = TransformerModel.from_pretrained(
  'models/',
  checkpoint_file='checkpoint_best.pt',
  data_name_or_path='bin/',
  is_gpu=True
).cuda()
inputs = "Di-mairt Clodh-bhualadh a cheud leabhair,"
print(trans.translate(inputs))

Following this idea, you can read the file and translate it easily. But maybe there is a better way to translate the file directly.

How to use fairseq interactive.py non-interactively?

Tags:

pytorch

machine-translation

fairseq

Ramraj Chandradevan

2 Answers

Xavier

xihajun

Recent Activity

Donate For Us

How to use fairseq interactive.py non-interactively?

Tags:

pytorch

machine-translation

fairseq

Ramraj Chandradevan

2 Answers

Xavier

xihajun

Related questions

Recent Activity

Donate For Us