2.6 Reduction, a coordination Pattern¶

There are often cases when every process needs to complete a partial result of an overall computation. For example if you want to process a large set of numbers by summing them together into one value (i.e. reduce a set of numbers into one value, its sum), you could do this faster by having each process compute a partial sum, then have all the processes communicate to add each of their partial sums together.

This is so common in parallel processing that there is a special collective communication function called reduce that does just this.

The type of reduction of many values down to one can be done with different types of operators on the set of values computed by each process.

Reduce all values using sum and max¶

In this example, every process computes the square of (id+1). Then all those values are summed together and also the maximum function is applied to them.

Exercises

Run, using -np = from 1 through 8 processes.
Try replacing MPI.MAX with MPI.MIN(minimum) and/or replacing MPI.SUM with MPI.PROD (product). Then save and run the code again.
Find the place in this code where the data computed on each process is being reduced to one value. Match the prints to the output you observe when you run it.

Revisiting the trapezoidal integration example¶

In section 2.4 we introduced the classic trapezoidal integration example by using sends and receives from the workers to the conductor, who then summed up a final total of all local sums of each process’ set of trapezoids. Now we can see a simpler way to find the total sum by using reduce. Find where that is done below and run it to see that it does work for a range of values for -np, including 1.

from mpi4py import MPI
import math

#constants
n = 1048576 #number of trapezoids = 2**20

def f(x):
    return math.sin(x)

def trapSum(my_a, my_b, my_n, h):
    my_a *= h
    my_b *= h
    total = (f(my_a) + f(my_b))/2.0 #initial value
    for i in range(1, my_n): #for each trapezoid
        total += f(my_a+i*h)
    return total*h

def main():
    comm = MPI.COMM_WORLD
    id = comm.Get_rank()
    numProcesses = comm.Get_size()

#desired range
    a, b = 0, math.pi

# all processes: compute local variables
    #############################################

local_n  = int(n/numProcesses) #num trapezoids per process
    isTail = n % numProcesses

start  = id*local_n            #starting trapeziod
    end = (id+1)*local_n           #ending trapezoid

#in case processors cannot divide things evenly
    if id == numProcesses-1 and isTail > 0:
        end = n

h = (b-a)/n #width of trapezoid (scaling factor)

# all processes: calculate local sum
    ############################################

my_sum = trapSum(start, end, local_n, h)

if numProcesses > 1 :
        # initiate and complete the reduction of our sums
        sum = comm.reduce(my_sum, op=MPI.SUM)
    else :
        sum = my_sum

if id == 0:
        print("Final sum is {0}".format(sum))

########## Run the main function
main()

Note

The reduce function is typically faster than using the earlier send-receive method, so you should learn to use this.

You have attempted of activities on this page