Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate over infinite files in a directory in Python

I'm using Python 3.3.

If I'm manipulating potentially infinite files in a directory (bear with me; just pretend I have a filesystem that supports that), how do I do that without encountering a MemoryError? I only want the string name of one file to be in memory at a time. I don't want them all in an iterable as that would cause a memory error when there are too many.

Will os.walk() work just fine, since it returns a generator? Or, do generators not work like that?

Is this possible?

like image 347
Brōtsyorfuzthrāx Avatar asked Jan 25 '26 05:01

Brōtsyorfuzthrāx


1 Answers

If you have a system for naming the files that can be figured out computationally, you can do such as this (this iterates over any number of numbered txt files, with only one in memory at a time; you could convert to another calculable system to get shorter filenames for large numbers):

import os

def infinite_files(path):
    num=0;
    while 1:
        if not os.path.exists(os.path.join(path, str(num)+".txt")):
            break
        else:
            num+=1 #perform operations on the file: str(num)+".txt"



[My old inapplicable answer is below]

glob.iglob seems to do exactly what the question asks for. [EDIT: It doesn't. It actually seems less efficient than listdir(), but see my alternative solution above.] From the official documentation:

glob.glob(pathname, *, recursive=False)
Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).


glob.iglob(pathname, *, recursive=False)
Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

iglob returns an "iterator which yields" or-- more concisely-- a generator.

Since glob.iglob has the same behavior as glob.glob, you can search with wildcard characters:

import glob
for x glob.iglob("/home/me/Desktop/*.txt"):
    print(x) #prints all txt files in that directory

I don't see a way for it to differentiate between files and directories without doing it manually. That is certainly possible, however.

like image 179
Brōtsyorfuzthrāx Avatar answered Jan 26 '26 20:01

Brōtsyorfuzthrāx



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!