I have a code which reads data from multiple files named 001.txt, 002.txt, ... , 411.txt. I would like to read the data from each file, plot them, and save as 001.jpg, 002.jpg, ... , 411.jpg.
I can do this by looping through the files, but I would like to use the multiprocess module to speed things up.
However, when I use the code below, the computer hangs- I can't click on anything, but the mouse moves, and the sound continues. I then have to power down the computer.
I'm obviously misusing the multiprocess module with matplotlib. I have used something very similar to the below code to actually generate the data, and save to text files with no problems. What am I missing?
import multiprocessing
def do_plot(number):
fig = figure(number)
a, b = random.sample(range(1,9999),1000), random.sample(range(1,9999),1000)
# generate random data
scatter(a, b)
savefig("%03d" % (number,) + ".jpg")
print "Done ", number
close()
for i in (0, 1, 2, 3):
jobs = []
# for j in chunk:
p = multiprocessing.Process(target = do_plot, args = (i,))
jobs.append(p)
p.start()
p.join()
The most important thing in using multiprocessing is to run the main code of the module only for the main process. This can be achieved by testing if __name__ == '__main__' as shown below:
import matplotlib.pyplot as plt
import numpy.random as random
from multiprocessing import Pool
def do_plot(number):
fig = plt.figure(number)
a = random.sample(1000)
b = random.sample(1000)
# generate random data
plt.scatter(a, b)
plt.savefig("%03d.jpg" % (number,))
plt.close()
print("Done ", number)
if __name__ == '__main__':
pool = Pool()
pool.map(do_plot, range(4))
Note also that I replaced the creation of the separate processes by a process pool (which scales better to many pictures since it only uses as many process as you have cores available).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With