How do I apply my python code to all of the files in a folder at once, and how do I create a new name for each subsequent output file?

Question

The code I am working with takes in a .pdf file, and outputs a .txt file. My question is, how do I create a loop (probably a for loop) which runs the code over and over again on all files in a folder which end in ".pdf"? Furthermore, how do I change the output each time the loop runs so that I can write a new file each time, that has the same name as the input file (ie. 1_pet.pdf > 1_pet.txt, 2_pet.pdf > 2_pet.txt, etc.)

Here is the code so far:

path="2_pet.pdf"
content = getPDFContent(path)
encoded = content.encode("utf-8")
text_file = open("Output.txt", "w")
text_file.write(encoded)
text_file.close()

Here is the code so far:

path="2_pet.pdf"
content = getPDFContent(path)
encoded = content.encode("utf-8")
text_file = open("Output.txt", "w")
text_file.write(encoded)
text_file.close()

Geeocode · Accepted Answer

The following script solve your problem:

import os

sourcedir = 'pdfdir'

dl = os.listdir('pdfdir')

for f in dl:
    fs = f.split(".")
    if fs[1] == "pdf":
        path_in = os.path.join(dl,f)
        content = getPDFContent(path_in)
        encoded = content.encode("utf-8")
        path_out = os.path.join(dl,fs[0] + ".txt")
        text_file = open(path_out, 'w')
        text_file.write(encoded)
        text_file.close()

ajerneck · Answer

Create a function that encapsulates what you want to do to each file.

import os.path

def parse_pdf(filename):
    "Parse a pdf into text"
    content = getPDFContent(filename)
    encoded = content.encode("utf-8")
    ## split of the pdf extension to add .txt instead.
    (root, _) = os.path.splitext(filename)
    text_file = open(root + ".txt", "w")
    text_file.write(encoded)
    text_file.close()

Then apply this function to a list of filenames, like so:

for f in files:
    parse_pdf(f)

How do I apply my python code to all of the files in a folder at once, and how do I create a new name for each subsequent output file?

Tags:

python

for-loop

parsing

naming

pypdf

Jack Bunce

2 Answers

Geeocode

ajerneck

Recent Activity

Donate For Us

How do I apply my python code to all of the files in a folder at once, and how do I create a new name for each subsequent output file?

Tags:

python

for-loop

parsing

naming

pypdf

Jack Bunce

2 Answers

Geeocode

ajerneck

Related questions

Recent Activity

Donate For Us