Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to count the amount of sentences in a paragraph in python [duplicate]

Tags:

python

this is what i have so far but my pparagraph only contains 5 full stops therefore only 5 sentences.but it keeps on returning 14 as a answer. can anyone help??

file = open ('words.txt', 'r')
lines= list (file)
file_contents = file.read()
print(lines)
file.close()
words_all = 0
for line in lines:
    words_all = words_all + len(line.split())
    print ('Total words:   ', words_all)
full_stops = 0
for stop in lines:
    full_stops = full_stops + len(stop.split('.'))
print ('total stops:    ', full_stops)

here is the txt file

A Turning machine is a device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a computer. The "Turing" machine was described by Alan Turing in 1936, who called it an "a(utomatic)-machine". The Turing machine is not intended as a practical computing technology, but rather as a hypothetical device representing a computing machine. Turing machines help computer scientists understand the limits of mechaniacl computation.

like image 279
Fiona Gaughan Avatar asked Nov 24 '25 06:11

Fiona Gaughan


2 Answers

Easiest way of doing it would be:

import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize

sentences = 'A Turning machine is a device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a computer. The "Turing" machine was described by Alan Turing in 1936, who called it an "a(utomatic)-machine". The Turing machine is not intended as a practical computing technology, but rather as a hypothetical device representing a computing machine. Turing machines help computer scientists understand the limits of mechaniacl computation.'

number_of_sentences = sent_tokenize(sentences)

print(len(number_of_sentences))

Output:

5
like image 142
Preetkaran Singh Avatar answered Nov 25 '25 18:11

Preetkaran Singh


Use regex.

In [13]: import re
In [14]: par  = "This is a paragraph? So it is! Ok, there are 3 sentences."
In [15]: re.split(r'[.!?]+', par)
Out[15]: ['This is a paragraph', ' So it is', ' Ok, there are 3 sentences', '']
like image 36
reptilicus Avatar answered Nov 25 '25 18:11

reptilicus



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!