Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given a word can we get all possible lemmas for it using Spacy?

The input word is standalone and not part of a sentence but I would like to get all of its possible lemmas as if the input word were in different sentences with all possible POS tags. I would also like to get the lookup version of the word's lemma.

Why am I doing this?

I have extracted lemmas from all the documents and I have also calculated the number of dependency links between lemmas. Both of which I have done using en_core_web_sm. Now, given an input word, I would like to return the lemmas that are linked most frequently to all the possible lemmas of the input word.

So in short, I would like to replicate the behaviour of token._lemma for the input word with all possible POS tags to maintain consistency with the lemma links I have counted.

like image 340
PSK Avatar asked Sep 13 '25 14:09

PSK


2 Answers

I found it difficult to get lemmas and inflections directly out of spaCy without first constructing an example sentence to give it context. This wasn't ideal, so I looked further and found LemmaInflect did this very well.

> from lemminflect import getAllLemmas, getInflection, getAllInflections, getAllInflectionsOOV

> getAllLemmas('watches')
{'NOUN': ('watch',), 'VERB': ('watch',)}

> getAllInflections('watch')
{'NN': ('watch',), 'NNS': ('watches', 'watch'), 'VB': ('watch',), 'VBD': ('watched',), 'VBG': ('watching',), 'VBZ': ('watches',),  'VBP': ('watch',)}
like image 77
jitters Avatar answered Sep 16 '25 03:09

jitters


spaCy is just not designed to do this - it's made for analyzing text, not producing text.

The linked library looks good, but if you want to stick with spaCy or need languages besides English, you can look at spacy-lookups-data, which is the raw data used for lemmas. Generally there will be a dictionary for each part of speech that lets you look up the lemma for a word.

like image 43
polm23 Avatar answered Sep 16 '25 03:09

polm23