Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Pandas write to the same CSV file concurrently?

Tags:

python

pandas

I mistakenly had two scripts running at the same time that wrote a pandas dataframe in chunks to the same CSV file. Since the CSV file was supposed to be appended, the script itself doesn't block writing to the CSV file if it already exists. I didn't catch it until it was too late.

Kinda like this:

script1.py

for i, chunk in enumerate(datachunks):
       do something
       result_df.to_csv('csvfile.csv') (in write mode for the 1st chunk, append mode for the next chunks)

script2.py

for i, chunk in enumerate(datachunks2):
       do something
       result_df.to_csv('csvfile.csv') (in write mode for the 1st chunk, append mode for the next chunks)
       # should have been csvfile2.csv

Each script takes around 12 hours to execute due to the sheer volume of data that has to be processed, and I think it's faster to separate the CSV file into two so that I get the outputs that each script should have given. This should work -- unless I have unintended duplicates in the file, or even lines that didn't write.

Both scripts finished without any errors, if that's relevant.

Any chance of duplicates/missing data in this csvfile.csv?

like image 611
irene Avatar asked Sep 16 '25 16:09

irene


1 Answers

I decided to just rerun the scripts and compare the outputs. Seems it's not promising -- I lost a lot of rows.

like image 88
irene Avatar answered Sep 18 '25 08:09

irene