Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging two csv without pandas

I have two CSV files that I would like to merge. With pandas I would use:

pd.merge(df1,df2, how='left', left_on='ST_LOGINID', right_on='LOGINID')

However panda runs out of memory performing this operation ("MemoryError:"), although my RAM usage only goes from 1.9 GB to 2.2GB out of 4GB before the error is returned.

I am thus looking for either one of these solutions: 1) One way to perform such a merge/join operation without loading the files into memory 2) One way to allow pandas to use more RAM, since it seems that there is plenty of memory available.

like image 969
Alexis Eggermont Avatar asked Jan 17 '26 10:01

Alexis Eggermont


1 Answers

Try csvkit:

First install with:

pip install csvkit

Then:

csvjoin -c "ST_LOGINID, LOGINID" --outer file1.csv file2.csv
like image 62
elyase Avatar answered Jan 20 '26 02:01

elyase



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!