I am new to the language modeling and a make a 3grams language model using kenlm(or this) from a large text file (~7gb.). I make a binary file from my language model and call it in python like this:
import kenlm
model = kenlm.LanguageModel(<my .klm file>)
model.score(<my sentence>)
and i get a negative number as the result.and when i change the sentence for scoring, the result remains negative but changes.I give it exactly one of the large text file sentences but it gives me a bad negative number(in comparison with a sentence that does not in the text file) I dont know what does negative result means and how can i convert it to positive and normal result to select the most correct sentece between some sentences.
The final negative number say, -9.585592 is the log probability of the sentence. Since it's the logarithm, you need to compute the 10 to the power of that number, which is around 2.60 x 10-10. Maybe this is the positive number you are looking for.
More info here
To get the corresponding score that is between 0 and 1:
import math
print(math.pow(10,model.score(<my sentence>)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With