Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a book into chapters – Python

I have a large book stored in a single plain text file and want to parse it in order to create individual files for each chapter. I some simple regex that finds each chapter title, but I'm struggling at capturing all of the text in between.

import re

txt = open('book.txt', 'r')

for line in txt :
    if re.match("^[A-Z]+$", line):
        print line,

I know this is fairly rudimentary, but I'm new enough to python that it's got me a bit stumped. At the moment I'm going line by line, so my thought process is:

  1. If the line is a chapter title: Make a new file 'chapter_title.txt'
  2. If the next line isn't a chapter title: Write the line to chapter_title.txt

My attempts to actually write that out have been less successful though. Appreciate the help!

Edit: Specifically, I'm confused by the Python syntax for file I/O. I've tried:

for line in txt :
    if re.match("^[A-Z]+$", line):
        f = open(line + '.txt', 'w')
    else f.write(line + "\n")

as my general approach, but that's not gonna work as written. Hoping for help structuring the loops. Thanks

like image 405
gweintraub Avatar asked Mar 25 '26 09:03

gweintraub


1 Answers

I think this will work:

import re

with open('book.txt', 'r') as file:
    txt = file.readlines()

f = False

for line in txt:
    if re.match("^[A-Z]+$", line):
        if f: f.close()
        f = open(line + '.txt', 'w')

    else:
        f.write(line + "\n")

Maybe I should add some explanation:

  1. with will auto close the file. Close an opened file is important.

  2. readlines() function can read the file by lines and save the output to a list.

  3. Here I'm using f = False. So first time if f: will be False.

Now here is important, if the file f has been opened, then if f: will be True and the file will be closed by f.close()(but the first time f.close() will not run).

And then, f = open(line + '.txt', 'w') will write text into that file, when re.match("^[A-Z]+$", line) is True the file will be closed, and open another file, and again, again until the txt list is empty.

like image 133
Remi Crystal Avatar answered Mar 26 '26 21:03

Remi Crystal



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!