Consider the following two short programs.
normal_test.py:
import time
if __name__ == '__main__':
t_end = time.time() + 1
loop_iterations = 0
while time.time() < t_end:
loop_iterations += 1
print(loop_iterations)
Output (on my machine):
4900677
mp_test.py:
from multiprocessing import Process
from multiprocessing import Manager
import time
def loop1(ns):
t_end = time.time() + 1
while time.time() < t_end:
ns.loop_iterations1 += 1
def loop2(ns):
t_end = time.time() + 1
while time.time() < t_end:
ns.loop_iterations2 += 1
if __name__ == '__main__':
manager = Manager()
ns = manager.Namespace()
ns.loop_iterations1 = 0
ns.loop_iterations2 = 0
p1 = Process(target=loop1, args=(ns,))
p2 = Process(target=loop2, args=(ns,))
p1.start()
p2.start()
p1.join()
p2.join()
print(ns.loop_iterations1)
print(ns.loop_iterations2)
Output (on my machine):
5533
5527
I am hoping to use Python multiprocessing on a Raspberry Pi to read values from multiple ADCs in parallel. As such, speed is important. The laptop I ran these two programs on has four cores, so I can't understand why the processes created in the second program are only able to run nearly 900 times less iterations than the single process in the first program. Am I using the Python multiprocessing library incorrectly? How can I make the processes faster?
Am I using the Python multiprocessing library incorrectly?
Incorrectly? No. Inefficiently? Yes.
Remember that multiprocessing creates cooperative, but otherwise independent, instances of Python. Think of them as workers in a factory, or friends working on a big job.
If only one person is working on a project, that one person is free to move about the factory floor, pick up a tool, use it, put it down, move somewhere else, pick up the next tool, and so on. Add a second person—or worse, more people, perhaps even hundreds of people—and the person must now coordinate: if some area is shared, or some tool is shared, Bob can't just go grab something, he has to ask Alice first if she's done with it.
A Manager object is Python multiprocessing's general wrapper for sharing. Putting variables in a Manager Namespace means these are shared, so automatically check with everyone else before you use them. (More precisely, they're held in one location—one process—and accessed or changed from others via proxies.)
Here, you have done the metaphorical equivalent of replacing "Bob: count as fast as you can" with "Bob: constantly interrupt Alice to ask if she's counting, then count; Alice: count, but be constantly interrupted by Bob." Bob and Alice are now spending most, by far, of their time talking to each other, rather than counting.
As the documentation says:
... when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes.
(it starts with the phrase "as mentioned above" but it's not mentioned above!).
There are a bunch of standard tricks, such as batching to get a lot of work done between sharing events, or using shared memory to speed up the sharing—but with shared memory you introduce the need to lock items.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With