Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to share data between processes in Python using MPI?

Tags:

python

mpi

I'm attempting to parallelize a script I wrote. Each process needs to do a calculation and store the data to a specific part of an array (list of lists). Each process is calculating and storing its data alright, but I can't figure out how to get the data from the non-root processes to the root process so that it can print the data out to file. I created a minimum working example of my script---this one is designed to run on 2 cores only for simplicity:

from mpi4py import MPI 
import pdb 
import os

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

# Declare the array that will store all the temp results
temps = [[0 for x in xrange(5)] for x in xrange(4)]

# Loop over all directories
if rank==0:
   counter = 0 
   for i in range(2):
      for j in range(5):
         temps[i][j] = counter
     counter = counter + 1 

else:
   counter = 20
   for i in range(2,4):
      for j in range(5):
         temps[i][j] = counter
         counter = counter + 1 

temps = comm.bcast(temps,root=0)

if rank==0:

   print temps

I execute the script using:

mpiexec -n 2 python mne.py

When the case finishes, the output is:

[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

So you can see that the data sharing is not working as I want. Can someone please show me the correct way to get data back to the root process?

like image 792
rks171 Avatar asked Oct 14 '25 16:10

rks171


1 Answers

The code is working correctly, just not doing what you'd like.

This line

temps = comm.bcast(temps,root=0)

broadcasts processor 0's temps variable to all processors (including rank 0), which of course gives exactly the results above. You want to use gather (or allgather, if you want all of the processors to have the answer). That would look something more like this:

from mpi4py import MPI
import pdb
import os

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

assert size == 2

# Declare the array that will store all the temp results
temps = [[0 for x in xrange(5)] for x in xrange(4)]

# declare the array that holds the local results
locals =[[0 for x in xrange(5)] for x in xrange(2)]

# Loop over all directories
if rank==0:
   counter = 0
   for i in range(2):
      for j in range(5):
         locals[i][j] = counter
         counter = counter + 1

else:
   counter = 20
   for i in range(2):
      for j in range(5):
         locals[i][j] = counter
         counter = counter + 1

temps = comm.gather(locals,temps,root=0)

if rank==0:
   print temps

If you really want to do the collection in-place, and you know (say) that all the real data is going to be larger than the zero you've initialized the data with, you can use a reduction operation, but that goes easier with numpy arrays:

from mpi4py import MPI
import numpy

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

assert size == 2

# Declare the array that will store all the temp results
temps = numpy.zeros((4,5))

# Loop over all directories
if rank==0:
   counter = 0
   for i in range(2):
      for j in range(5):
         temps[i,j] = counter
         counter = counter + 1

else:
   counter = 20
   for i in range(2,4):
      for j in range(5):
         temps[i,j] = counter
         counter = counter + 1

comm.Allreduce(MPI.IN_PLACE,temps,op=MPI.MAX)

if rank==0:
   print temps
like image 194
Jonathan Dursi Avatar answered Oct 17 '25 10:10

Jonathan Dursi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!