I'm trying to use a python script to edit a large directory of .html files in a loop. I'm having trouble looping through the filenames using os.walk(). This chunk of code just turns the html files into strings that I can work with, but the script does not even enter the loop, as if the files don't exist. Basically it prints point1 but never reaches point2. The script ends without an error message. The directory is set up inside the folder called "amazon", and there is one level of 20 subfolders inside of it with 20 html files in each of those.
Oddly the code works perfectly on a neighboring directory that only contains .txt files, but it seems like it's not grabbing my .html files for some reason. Is there something I don't understand about the structure of the for root, dirs, filenames in os.walk() loop? This is my first time using os.walk, and I've looked at a number of other pages on this site to try to make it work.
import os
rootdir = 'C:\filepath\amazon'
print "point1"
for root, dirs, filenames in os.walk(rootdir):
    print "point2"
    for file in filenames:
        with open (os.path.join(root, file), 'r') as myfile:
             g = myfile.read()
        print g
Any help is much appreciated.
The backslash is used as an escape. Either double them, or use "raw strings" by putting a prefix "r" on it.
Example:
>>> 'C:\filepath\amazon'
'C:\x0cilepath\x07mazon'
>>> r'\x'
'\\x'
>>> '\x'
ValueError: invalid \x escape
Explanation: In Python, what does preceding a string literal with “r” mean?
You can avoid having to explicitly handle slashes of any sort by using os.path.join:
rootdir = os.path.join('C:', 'filepath', 'amazon')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With