Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with two or more Big CSV files using python with limited memory?

I only have 4G memory to use. the info of the files are following:

File    | Number of Rows  | Num of cols | Header name 
1st csv | 2,000,000+ rows | 3 cols.     | id1,id2,...
2nd csv | 10,000,000+ rows| 24 cols.    | id2,...
3rd csv | 170 rows        | 5 cols.     | id1,...

file information image

What I want to do is :

file1=pd.read_csv('data1.csv')
file2=pd.read_csv('data2.csv')
file3=pd.read_csv('data3.csv')
data=pd.merge(file1,file3,on='id1',how='left')
data=pd.merge(data,file2,on='id2',how='left')
#data to csv files: merge.csv

but memory is not enough, I have tried two ways: the first way is:

for data1_chunk in data1:
    for data2_chunk in data2:
        data = pd.merge(data1_chunk, data2_chunk, on='id2')
        data_merge = pd.concat([data_merge, data])

the sencond way is:

for data1_chunk, data2_chunk in zip(data1, data2):
    data_merge = pd.merge(data1_chunk, data2_chunk, on='id2', how='left')

But they do not work.

Is there any way using the para chunksize to deal with big csv files? Or other better or easy ways?

the question How to read a 6 GB csv file with pandas only tell how to deal one big csv file but not two or more, I want to know how to do the 'iterator' in two or more files with limited memory

like image 828
Kiwi Qi Avatar asked Dec 06 '25 04:12

Kiwi Qi


1 Answers

I find that using the code following maybe work, logically:

file1 = pd.read_csv('data1.csv', chunksize=100, iterator=True)
temp = None
temp_chunk = None
for chunk1 in file1:
    file2 = pd.read_csv('data2.csv', chunksize =100, iterator=True)
    for chunk2 in file2:
        temp_chunk = pd.merge(chun1, chunk2, on='id', how='inner')
        temp = pd.concat([temp, temp_chunk])
finalData = temp.drop_duplicates(keep='first')
process finalData...

It takes more times but less memory I think.

like image 68
Kiwi Qi Avatar answered Dec 07 '25 18:12

Kiwi Qi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!