How to share data between processes in Python using MPI?

Question

I'm attempting to parallelize a script I wrote. Each process needs to do a calculation and store the data to a specific part of an array (list of lists). Each process is calculating and storing its data alright, but I can't figure out how to get the data from the non-root processes to the root process so that it can print the data out to file. I created a minimum working example of my script---this one is designed to run on 2 cores only for simplicity:

from mpi4py import MPI 
import pdb 
import os

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

# Declare the array that will store all the temp results
temps = [[0 for x in xrange(5)] for x in xrange(4)]

# Loop over all directories
if rank==0:
   counter = 0 
   for i in range(2):
      for j in range(5):
         temps[i][j] = counter
     counter = counter + 1 

else:
   counter = 20
   for i in range(2,4):
      for j in range(5):
         temps[i][j] = counter
         counter = counter + 1 

temps = comm.bcast(temps,root=0)

if rank==0:

   print temps

I execute the script using:

mpiexec -n 2 python mne.py

When the case finishes, the output is:

[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

So you can see that the data sharing is not working as I want. Can someone please show me the correct way to get data back to the root process?

Jonathan Dursi · Accepted Answer

The code is working correctly, just not doing what you'd like.

This line

temps = comm.bcast(temps,root=0)

broadcasts processor 0's temps variable to all processors (including rank 0), which of course gives exactly the results above. You want to use gather (or allgather, if you want all of the processors to have the answer). That would look something more like this:

from mpi4py import MPI
import pdb
import os

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

assert size == 2

# Declare the array that will store all the temp results
temps = [[0 for x in xrange(5)] for x in xrange(4)]

# declare the array that holds the local results
locals =[[0 for x in xrange(5)] for x in xrange(2)]

# Loop over all directories
if rank==0:
   counter = 0
   for i in range(2):
      for j in range(5):
         locals[i][j] = counter
         counter = counter + 1

else:
   counter = 20
   for i in range(2):
      for j in range(5):
         locals[i][j] = counter
         counter = counter + 1

temps = comm.gather(locals,temps,root=0)

if rank==0:
   print temps

If you really want to do the collection in-place, and you know (say) that all the real data is going to be larger than the zero you've initialized the data with, you can use a reduction operation, but that goes easier with numpy arrays:

from mpi4py import MPI
import numpy

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

assert size == 2

# Declare the array that will store all the temp results
temps = numpy.zeros((4,5))

# Loop over all directories
if rank==0:
   counter = 0
   for i in range(2):
      for j in range(5):
         temps[i,j] = counter
         counter = counter + 1

else:
   counter = 20
   for i in range(2,4):
      for j in range(5):
         temps[i,j] = counter
         counter = counter + 1

comm.Allreduce(MPI.IN_PLACE,temps,op=MPI.MAX)

if rank==0:
   print temps

How to share data between processes in Python using MPI?

Tags:

python

mpi

rks171

1 Answers

Jonathan Dursi

Recent Activity

Donate For Us

How to share data between processes in Python using MPI?

Tags:

python

mpi

rks171

1 Answers

Jonathan Dursi

Related questions

Recent Activity

Donate For Us