Python VADER lexicon Structure for sentiment analysis

Question

I am using the VADER sentiment lexicon in Python's nltk library to analyze text sentiment. This lexicon does not suit my domain well, and so I wanted to add my own sentiment scores to various words. So, I got my hands on the lexicon text file (vader_lexicon.txt) to do just that. However, I do not understand the architecture of this file well. For example, a word like obliterate will have the following data in the text file: obliterate -2.9 0.83066 [-3, -4, -3, -3, -3, -3, -2, -1, -4, -3]

Clearly the -2.9 is the average of sentiment scores in the list. But what does the 0.83066 represent?

Thanks!

Clearly the -2.9 is the average of sentiment scores in the list. But what does the 0.83066 represent?

Thanks!

DYZ · Accepted Answer

According to the VADER source code, only the first number on each line is used. The rest of the line is ignored:

for line in self.lexicon_full_filepath.split('
'):
    (word, measure) = line.strip().split('	')[0:2] # Here!
    lex_dict[word] = float(measure)

Python VADER lexicon Structure for sentiment analysis

Tags:

python

nltk

lexicon

vader

user2238328

1 Answers

DYZ

Recent Activity

Donate For Us

Python VADER lexicon Structure for sentiment analysis

Tags:

python

nltk

lexicon

vader

user2238328

1 Answers

DYZ

Related questions

Recent Activity

Donate For Us