I'm opening a lot of PDF's and I want to delete the PDF's after they have been parsed, but the files remain open until the program is done running. How do I close the PDf's I open using PyPDF2?
Code:
def getPDFContent(path):
    content = ""
    # Load PDF into pyPDF
    pdf = PyPDF2.PdfFileReader(file(path, "rb"))
    #Check for number of pages, prevents out of bounds errors
    max = 0
    if pdf.numPages > 3:
        max = 3
    else:
        max = (pdf.numPages - 1)
    # Iterate pages
    for i in range(0, max): 
        # Extract text from page and add to content
        content += pdf.getPage(i).extractText() + "\n"
    # Collapse whitespace
    content = " ".join(content.replace(u"\xa0", " ").strip().split())
    #pdf.close()
    return content
just open and close the file yourself
f = open(path, "rb")
pdf = PyPDF2.PdfFileReader(f)
f.close()
PyPDF2 .read()s the stream that you pass in, right in the constructor.  So after the initial object construction, you can just toss the file.
A context manager will work, too:
with open(path, "rb") as f:
    pdf = PyPDF2.PdfFileReader(f)
do_other_stuff_with_pdf(pdf)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With