Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dask distributed workers always leak memory when running many tasks

What are some strategies to work around or debug this?

distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 26.17 GB -- Worker memory limit: 32.66 GB

Basically, am just running lots of parallel jobs on single machine but but a dask-scheduler and have tried various numbers of workers. Any time I launch a large number of jobs the memory gradually creeps up over time and only goes down when I bounce the cluster.

I am trying to use fire_and_forget. Will .release() the futures help? I am typically launching these tasks via client.submit from the REPL and then terminating the REPL.

Would be happy to occasionally bounce workers and add some retry patterns if that is the correct way to use dask with leaky libraries.

UPDATE:

I have tried limited worker memory to 2 GB, but am still getting this error. When the error happens it seems to go into some sort of unrecoverable loop continually printing the error and no compute happens.

like image 687
mathtick Avatar asked Oct 29 '25 00:10

mathtick


1 Answers

Dask isn't leaking the memory in this case. Something else is. Dask is just telling you about it. Something about the code that you are running with Dask seems to be leaking something.

like image 96
MRocklin Avatar answered Oct 31 '25 02:10

MRocklin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!