Having a bit of a predicament in python. I'd like to take a .txt file with many comments and split it into a list. However, I'd like to split on all punctuation, spaces and \n. When I run the following python code, it splits my text file in weird spots. NOTE: Below I am only trying to split on periods and endlines to test it out. But it is still often getting rid of the last letter in words.
import regex as re
with open('G:/My Documents/AHRQUnstructuredComments2.txt','r') as infile:
nf = infile.read()
wList = re.split('. | \n, nf)
print(wList)
You need to fix the quote marks and make a slight change to the regular expression:
import regex as re
with open('G:/My Documents/AHRQUnstructuredComments2.txt','r') as infile:
nf = infile.read()
wList = re.split('\W+' nf)
print(wList)
In regex, the character . means any character. You have to escape it, \., to capture periods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With