I have a large book stored in a single plain text file and want to parse it in order to create individual files for each chapter. I some simple regex that finds each chapter title, but I'm struggling at capturing all of the text in between.
import re
txt = open('book.txt', 'r')
for line in txt :
if re.match("^[A-Z]+$", line):
print line,
I know this is fairly rudimentary, but I'm new enough to python that it's got me a bit stumped. At the moment I'm going line by line, so my thought process is:
My attempts to actually write that out have been less successful though. Appreciate the help!
Edit: Specifically, I'm confused by the Python syntax for file I/O. I've tried:
for line in txt :
if re.match("^[A-Z]+$", line):
f = open(line + '.txt', 'w')
else f.write(line + "\n")
as my general approach, but that's not gonna work as written. Hoping for help structuring the loops. Thanks
I think this will work:
import re
with open('book.txt', 'r') as file:
txt = file.readlines()
f = False
for line in txt:
if re.match("^[A-Z]+$", line):
if f: f.close()
f = open(line + '.txt', 'w')
else:
f.write(line + "\n")
Maybe I should add some explanation:
with will auto close the file. Close an opened file is important.
readlines() function can read the file by lines and save the output to a list.
Here I'm using f = False. So first time if f: will be False.
Now here is important, if the file f has been opened, then if f: will be True and the file will be closed by f.close()(but the first time f.close() will not run).
And then, f = open(line + '.txt', 'w') will write text into that file, when re.match("^[A-Z]+$", line) is True the file will be closed, and open another file, and again, again until the txt list is empty.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With