I have to cut a unicode string which is actually an article (contains sentences) I want to cut this article string after Xth sentence in python.
A good indicator of a sentence ending is that it ends with full stop (".") and the word after start with capital name. Such as
myarticle == "Hi, this is my first sentence. And this is my second. Yet this is my third."
How can this be achieved ?
Thanks
Consider downloading the Natural Language Toolkit (NLTK). Then you can create sentences that will not break for things like "U.S.A." or fail to split sentences that end in "?!".
>>> import nltk
>>> paragraph = u"Hi, this is my first sentence. And this is my second. Yet this is my third."
>>> sentences = nltk.sent_tokenize(paragraph)
[u"Hi, this is my first sentence.", u"And this is my second.", u"Yet this is my third."]
Your code becomes much more readable. To access the second sentence, you use notation you're used to.
>>> sentences[1]
u"And this is my second."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With