CHAPTER 4: Parallel Sum¶
Very often, loops are used with an accumulator variable to compute a a single value from a set of values, such as the sum of integers in an array or list. Fortunately, OpenMP implements a special parallel pattern known as reduction, which will aid us in this process. The reduction pattern is one of a group of patterns called collective communication patterns because the threads must somehow work together to create the final desired single value.
4.1 An Initial Attempt¶
From the ParallelLoopChunksOf1
directory, let’s enter the reduction
directory by typing in the following command:
cd ../reduction
Running ls
in this new directory should reveal the following two files:
Makefile
reduction.c
The code in reduction.c
is fairly long, so we will discuss it in parts:
17#include <stdio.h> // printf()
18#include <omp.h> // OpenMP
19#include <stdlib.h> // rand()
20
21void initialize(int* a, int n);
22int sequentialSum(int* a, int n);
23int parallelSum(int* a, int n);
24
25#define SIZE 1000000
26
27int main(int argc, char** argv) {
28 int array[SIZE];
29
30 if (argc > 1) {
31 omp_set_num_threads( atoi(argv[1]) );
32 }
33
34 initialize(array, SIZE);
35 printf("\nSeq. sum: \t%d\nPar. sum: \t%d\n\n",
36 sequentialSum(array, SIZE),
37 parallelSum(array, SIZE) );
38
39 return 0;
40}
The main()
function in reduction.c
first initializes an array of 1 million random integers.
The main()
function then executes two separate functions: sequentialSum()
(which sums up an array sequentially),
and parallelSum()
which attempts to sum up the array in parallel.
For simpliicty, we next show only the parallelSum()
function:
60/* sum the array using multiple threads */
61int parallelSum(int* a, int n) {
62 int sum = 0;
63 int i;
64// #pragma omp parallel for // reduction(+:sum)
65 for (i = 0; i < n; i++) {
66 sum += a[i];
67 }
68 return sum;
69}
In its current form, the function is identical to how the sequentialSum()
function works. However, line 64 contains
some commented out code that should parallelize this function. For now, let’s keep line 64 commented out.
Try It Out¶
First, let’s compile and run the program using the following command:
make
./reduction 4
- The sequential sum and parallel sum functions output the same sum.
- Correct! However, the reason why should be obvious; at this point, BOTH functions are running the code serially!
- The sequential sum and parallel sum functions output different sums.
- Not correct; did you ensure that you didn't change the code? Remember, at this point, we are running the code with line 64 still commented out.
Q-1: Run the code a few times using 4 threads as the second parameter. What do you see?
4.2 Edit, Build and Run in Parallel¶
Let’s next edit the parallelSum()
function so that it executes in parallel by removing //
comment at the front of line 64.
Be sure to ONLY remove the first //
(keep the reduction clause commented out for now).
Recompile and rerun the program using the following command:
make
./reduction 4
- The sequential sum and parallel sum functions output the same sum.
- Not quite; try recompiling (run "make") and running the program a few times. It is always the same sum?
- The sequential sum and parallel sum functions output different sums.
- Correct! It looks the "omp parallel for" pragma is not enough in this case!
Q-2: Run the code a few times using 4 threads. What do you see?
Line 64 of the parallelSum
function currently contains a (commented out) special clause that can be used with
the omp parallel for
pragma called reduction
. The notion of a reduction comes from the mathematical operation
reduce, in which a collection of values are combined into a single value via a common mathematical function.
Summing up a collection of values is therefore a natural example of reduction.
In this program, the reduction(+:sum)
clause indicates that a summation is being performed (the +
identifier)
on the the variable sum
, which acts as the accumulator.
- The sequential sum and parallel sum functions output the same sum.
- Correct! The program now correctly computes the sum in parallel.
- The sequential sum and parallel sum functions output different sums.
- Did you remember to uncomment the reduction clause on line 64? Did you recompile using "make"?
Q-3: Remove the second // on line 64, and recompile. Re-run the code a few times using 4 threads. What do you see?
4.3 Something to think about¶
Do you have any ideas about why the omp parallel for
pragma without the reduction clause did not produce the correct
result?
- Accumulators in loops are something that cannot be parallelized properly.
- Nope. This program CAN be properly parallelized. We are just doing something wrong!
- The program is losing values or overwriting values somewhere (i.e. there is a race condition)
- Correct!
- Fairies have infested the computer and are wrecking havoc as we speak.
- Nope! Thankfully, the Raspberry Pi is fairy-proof. :)
- It's some other issue.
- Actually, the issue is listed as one of the options!
Q-4: What is going on?
The sum example that we presented in this chapter was a fairly small example. In the next two chapters, we will explore much larger examples.