Use Python to print sentences belonging to most common words in a document

Question

I have a text document, I am using regex and nltk to find top 5 most common words from this document. I have to print out sentences where these words belong to, how do I do that? further, I want to extend this to finding common words in multiple documents and returning their respective sentences.

import nltk
import collections
from collections import Counter

import re
import string

frequency = {}
document_text = open('test.txt', 'r')
text_string = document_text.read().lower()
match_pattern = re.findall(r'\b[a-z]{3,15}\b', text_string) #return all the words with the number of characters in the range [3-15]

fdist = nltk.FreqDist(match_pattern) # creates a frequency distribution  from a list
most_common = fdist.max()    # returns a single element
top_five = fdist.most_common(5)# returns a list

list_5=[word for (word, freq) in fdist.most_common(5)]


print(top_five)
print(list_5)

Output:

[('you', 8), ('tuples', 8), ('the', 5), ('are', 5), ('pard', 5)]
['you', 'tuples', 'the', 'are', 'pard']

The output is most commonly occurring words I have to print sentences where these words belong to, how do I do that?

Rachel Kogan · Accepted Answer

Although it doesn't account for special characters at word boundaries like your code does, the following would be a starting point:

for sentence in text_string.split('.'):
    if list(set(list_5) & set(sentence.split(' '))):
        print sentence

We first iterate over the sentences, assuming each sentence ends with a . and the . character is nowhere else in the text. Afterwards, we print the sentence if the intersection of its set of words with the set of words in your list_5 is not empty.

Use Python to print sentences belonging to most common words in a document

Tags:

python

nlp

nltk

hashtag

Ajinkya

1 Answers

Rachel Kogan

Recent Activity

Donate For Us

Use Python to print sentences belonging to most common words in a document

Tags:

python

nlp

nltk

hashtag

Ajinkya

1 Answers

Rachel Kogan

Related questions

Recent Activity

Donate For Us