Parsing a book into chapters – Python

Question

I have a large book stored in a single plain text file and want to parse it in order to create individual files for each chapter. I some simple regex that finds each chapter title, but I'm struggling at capturing all of the text in between.

import re

txt = open('book.txt', 'r')

for line in txt :
    if re.match("^[A-Z]+$", line):
        print line,

I know this is fairly rudimentary, but I'm new enough to python that it's got me a bit stumped. At the moment I'm going line by line, so my thought process is:

If the line is a chapter title: Make a new file 'chapter_title.txt'
If the next line isn't a chapter title: Write the line to chapter_title.txt

My attempts to actually write that out have been less successful though. Appreciate the help!

Edit: Specifically, I'm confused by the Python syntax for file I/O. I've tried:

for line in txt :
    if re.match("^[A-Z]+$", line):
        f = open(line + '.txt', 'w')
    else f.write(line + "
")

as my general approach, but that's not gonna work as written. Hoping for help structuring the loops. Thanks

Remi Crystal · Accepted Answer

I think this will work:

import re

with open('book.txt', 'r') as file:
    txt = file.readlines()

f = False

for line in txt:
    if re.match("^[A-Z]+$", line):
        if f: f.close()
        f = open(line + '.txt', 'w')

    else:
        f.write(line + "
")

Maybe I should add some explanation:

with will auto close the file. Close an opened file is important.
readlines() function can read the file by lines and save the output to a list.
Here I'm using f = False. So first time if f: will be False.

Now here is important, if the file f has been opened, then if f: will be True and the file will be closed by f.close()(but the first time f.close() will not run).

And then, f = open(line + '.txt', 'w') will write text into that file, when re.match("^[A-Z]+$", line) is True the file will be closed, and open another file, and again, again until the txt list is empty.

Parsing a book into chapters – Python

Tags:

python

regex

parsing

text-analysis

gweintraub

1 Answers

Remi Crystal

Recent Activity

Donate For Us

Parsing a book into chapters – Python

Tags:

python

regex

parsing

text-analysis

gweintraub

1 Answers

Remi Crystal

Related questions

Recent Activity

Donate For Us