Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python VADER lexicon Structure for sentiment analysis

I am using the VADER sentiment lexicon in Python's nltk library to analyze text sentiment. This lexicon does not suit my domain well, and so I wanted to add my own sentiment scores to various words. So, I got my hands on the lexicon text file (vader_lexicon.txt) to do just that. However, I do not understand the architecture of this file well. For example, a word like obliterate will have the following data in the text file: obliterate -2.9 0.83066 [-3, -4, -3, -3, -3, -3, -2, -1, -4, -3]

Clearly the -2.9 is the average of sentiment scores in the list. But what does the 0.83066 represent?

Thanks!

like image 526
user2238328 Avatar asked Nov 25 '25 12:11

user2238328


1 Answers

According to the VADER source code, only the first number on each line is used. The rest of the line is ignored:

for line in self.lexicon_full_filepath.split('\n'):
    (word, measure) = line.strip().split('\t')[0:2] # Here!
    lex_dict[word] = float(measure)
like image 172
DYZ Avatar answered Nov 28 '25 03:11

DYZ



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!