According to the documentation i can load a sense tagged corpus in nltk as such:
>>> from nltk.corpus import wordnet_ic
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')
I can also get the definition, pos, offset, examples as such:
>>> wn.synset('dog.n.01').examples
>>> wn.synset('dog.n.01').definition
But how can get the frequency of a synset from a corpus? To break down the question:
I managed to do it this way.
from nltk.corpus import wordnet as wn
word = "dog"
synsets = wn.synsets(word)
sense2freq = {}
for s in synsets:
  freq = 0  
  for lemma in s.lemmas:
    freq+=lemma.count()
  sense2freq[s.offset+"-"+s.pos] = freq
for s in sense2freq:
  print s, sense2freq[s]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With