Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect uncertainty of text in NLTK Python?

I am a beginner at NLTK and machine learning with the goal of giving uncertainty ratings to sentences. For example, a sentence like This is likely caused by a.. would receive a certainty score of say 6, where as There is definitely something wrong with me would receive a 10 and I think it could possibly happen would score a 3.

Regardless of the score system, a classification of "certain" and "uncertain" can also suffice my needs.

I did not find any existing works on this. How would I approach this? I do have some untrained text data.

like image 440
Ewen W. Avatar asked Sep 06 '25 23:09

Ewen W.


1 Answers

As far as I know, existing nlp toolkits do not have such feature.

You have to train your own model and for that you need training data. If you have a dataset that contains uncertainty labels for each sentence, then you can train a text classification model on that.

If you don't have labeled data, there was a CoNLL 2010 Shared task on detecting uncertainty/hedging and the dataset for that should be available. You can access the CoNLL 2010 dataset and train a simple text classifier on that and use the trained model on your own dataset. Assuming that the nature of your data is not very different than theirs, this should work.

For text classification, you can simply use scikit-learn library which is straight forward.

You might also find the following references useful:

like image 142
CentAu Avatar answered Sep 09 '25 22:09

CentAu