Whenever I run a python script that uses tensorflow and for some reason decide to kill it before it finishes, there is the problem that ctrl-c doesn't work. I would use ctrl-z but it doesn't release the gpu memory, so when i try to re-run the script there is no memory left. Is there a solution for this in linux?
I always start tensorflow programs from script. For instance:
python tf_run.py 1> ./log 2> ./err &
Then use top/htop to monitor your program status. In case there are many other progresses on your machine, top only the python progresses.
top -p $(pgrep -d',' python)
Finally, when you want to kill the progress,
ps aux | grep tf_run.py | awk '{print $2}' | xargs kill -9
This command line is extremely useful when you there are multiple tensor flow progresses.
Don't run this on your desktop, but for HPC/remote machines with no display, this kills all left over GPU-using processes:
nvidia-smi -q -d PIDS | grep -P "Process ID +: [0-9]+" | grep -Po "[0-9]+" | xargs kill -9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With