I have many zip files stored in my path
mypath/data1.zipmypath/data2.zipEach zip file contains three different txt files. For instance, in data1.zip there is:
data1_a.txtdata1_b.txtdata1_c.txtI need to load datai_c.txt from each zipped file (that is, data1_c.txt, data2_c.txt, data3_c.txt, etc) and concatenate them into a dataframe.
Unfortunately I am unable to do so using read_csv because it only works with a single zipped file.
Any ideas how to do so? Thanks!
So you need some other code to reach into the zip file. Below is modified code from O'Reilly's Python Cookbook
import zipfile
import pandas as pd
## make up some data for example
x = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
x.to_csv('a.txt', sep="|", index=False)
(x * 2).to_csv('b.txt', sep="|", index=False)
with zipfile.ZipFile('zipfile.zip', 'w') as myzip:
myzip.write('a.txt')
myzip.write('b.txt')
for filename in z.namelist( ): print 'File:', filename,
insideDF = pd.read_csv(StringIO(z.read(filename)))
df = pd.concat([df, insideDF])
print df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With