Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read and process multiple files simultaneously in python

Tags:

python

I have multiple files and I want to read them simultaneously, extract a number from each row and do the averages. For a small number of files I did this using izip in the itertools module. Here is my code.

from itertools import izip
import math

g=open("MSDpara_ave_nvt.dat",'w')

with open("sample1/err_msdCECfortran_nvt.dat",'r') as f1, \
     open("sample2/err_msdCECfortran_nvt.dat",'r') as f2, \
     open("sample3/err_msdCECfortran_nvt.dat",'r') as f3, \
     open("err_msdCECfortran_nvt.dat",'r') as f4:

     for x,y,z,bg in izip(f1,f2,f3,f4):
         args1=x.split()
         i1 = float(args1[0])
         msd1 = float(args1[1])


         args2=y.split()
         i2 = float(args2[0])
         msd2 = float(args2[1])


         args3=z.split()
         i3 = float(args3[0])
         msd3 = float(args3[1])

         args4=bg.split()
         i4 = float(args4[0])
         msd4 = float(args4[1])


         msdave = (msd1 + msd2 + msd3 + msd4)/4.0

         print>>g, "%e  %e" %(i1, msdave)

 f1.close()
 f2.close()
 f3.close()
 f4.close()
 g.close()

This code works OK. But if I want to handle 100 files simultaneously, the code becomes very lengthy if I do it in this way. Are there any other simpler ways of doing this? It seems that fileinput module can also handle multiple files, but I don't know if it can do it simultaneously.

Thanks.

like image 867
user2226358 Avatar asked Nov 02 '25 15:11

user2226358


1 Answers

The with open pattern is good, but in this case it gets in your way. You can open a list of files, then use that list inside izip:

filenames = ["sample1/err_msdCECfortran_nvt.dat",...]
files = [open(i, "r") for i in filenames]
for rows in izip(*files):
    # rows is now a tuple containing one row from each file

In Python 3.3+ you can also use ExitStack in a with block:

filenames = ["sample1/err_msdCECfortran_nvt.dat",...]
with ExitStack() as stack:
    files = [stack.enter_context(open(i, "r")) for i in filenames]
    for rows in zip(*files):
        # rows is now a tuple containing one row from each file

In Python < 3.3, to use with with all its advantages (e.g. timely closing no matter how you exit the block), you would need to create your own context manager:

class FileListReader(object):

    def init(self, filenames):
        self.files = [open(i, "r") for i in filenames]

    def __enter__(self):
        for i in files:
            i.__enter__()
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        for i in files:
            i.__exit__(exc_type, exc_value, traceback)

Then you could do:

filenames = ["sample1/err_msdCECfortran_nvt.dat",...]
with FileListReader(filenames) as f:
    for rows in izip(*f.files):
        #...

In this case the last might be considered over-engineering, though.

like image 105
otus Avatar answered Nov 04 '25 06:11

otus