Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read all files in .zip archive in python

Tags:

python

zip

I'm trying to read all files in a .zip archive named data1.zip using the glob() method.

import glob
from zipfile import ZipFile

archive = ZipFile('data1.zip','r')
files = archive.read(glob.glob('*.jpg'))

Error Message:

TypeError: unhashable type: 'list'

The solution to the problem I'm using is:

files = [archive.read(str(i+1)+'.jpg') for i in range(100)]

This is bad because I'm assuming my files are named 1.jpg, 2.jpg, etc.

Is there a better way using python best practices to do this? Doesn't need to be necessarily using glob()

like image 697
fabda01 Avatar asked Sep 04 '25 01:09

fabda01


1 Answers

glob doesn't look inside your archive, it'll just give you a list of jpg files in your current working directory.

ZipFile already has methods for returning information about the files in the archive: namelist returns names, and infolist returns ZipInfo objects which include metadata as well.

Are you just looking for:

archive = ZipFile('data1.zip', 'r')
files = archive.namelist()

Or if you only want .jpg files:

files = [name for name in archive.namelist() if name.endswith('.jpg')]

Or if you want to read all the contents of each file:

files = [archive.read(name) for name in archive.namelist()]

Although I'd probably rather make a dict mapping names to contents:

files = {name: archive.read(name) for name in archive.namelist()}

That way you can access contents like so:

files['1.jpg']

Or get a list of the files presents using files.keys(), etc.

like image 79
tzaman Avatar answered Sep 07 '25 21:09

tzaman