CHAPTER 5: Integration using the Trapezoidal Rule¶
5.1 Estimating the Area Under a Curve¶
As our next example, let’s look at the problem of estimating the area under a curve. If you have taken a calculus course, you may recognize this problem as the Riemann sum or Trapezoidal Rule, which approximates the area under the curve (i.e. the integral) by splitting the area under the curve into a series of trapezoids.
In the following programs, we attempt to use the trapezoid rule to approximate the integral
using \(2^{20}\) equal subdivisions. The answer from this computation should be 2.0. The following video shows how a single thread would solve this problem:
In this example, the single thread serially computes the area of each trapezoid and adds all the trapezoids together to one value.
A C implementation of this program may look like the following:
1#include <stdio.h>
2#include <math.h>
3#include <stdlib.h>
4
5//function we are calculating the area under
6double f(double x) {
7 return sin(x);
8}
9
10//estimation of pi
11const double pi = 3.141592653589793238462643383079;
12
13int main(int argc, char** argv) {
14 //Variables
15 double a = 0.0, b = pi; //limits of integration
16 int n = 1048576; //number of subdivisions = 2^20
17 double h = (b - a) / n; //width of each subdivision
18 double integral; // accumulates answer
19
20 integral = (f(a) + f(b))/2.0; //initial value
21
22 //sum up all the trapezoids
23 int i;
24 for(i = 1; i < n; i++) {
25 integral += f(a+i*h);
26 }
27
28 integral = integral * h;
29 printf("With %d trapezoids, our estimate of the integral from \n", n);
30 printf("%f to %f is %f\n", a,b,integral);
31
32 return 0;
33}
Let’s now consider how we can use multiple threads to approximate the area under the curve in parallel. One strategy would be to assign each thread a subset of the total set of subdivisions, so that each thread separately computes its assigned set of trapezoids.
The following video illustrates how 4 threads would work together to approximate the area under the curve:
5.2 Parallel Integration - First Attempt¶
Let’s visit the integration
directory by using the following command (note that this directory is on the same level as the patternlets
directory) in the reduction
directory:
cd ../../integration
The following program attempts to update the serial version of the integration program with OpenMP pragmas:
1#include <math.h>
2#include <stdio.h> // printf()
3#include <stdlib.h> // atoi()
4#include <omp.h> // OpenMP
5
6/* Demo program for OpenMP:
7computes trapezoidal approximation to an integral*/
8
9const double pi = 3.141592653589793238462643383079;
10
11int main(int argc, char** argv) {
12 /* Variables */
13 double a = 0.0, b = pi; /* limits of integration */;
14 int n = 1048576; /* number of subdivisions = 2^20 */
15 double h = (b - a) / n; /* width of subdivision */
16 double integral; /* accumulates answer */
17 int threadcnt = 1;
18
19 double f(double x);
20
21 /* parse command-line arg for number of threads */
22 if (argc > 1) {
23 threadcnt = atoi(argv[1]);
24 }
25
26 #ifdef _OPENMP
27 omp_set_num_threads( threadcnt );
28 printf("OMP defined, threadct = %d\n", threadcnt);
29 #else
30 printf("OMP not defined");
31 #endif
32
33 integral = (f(a) + f(b))/2.0;
34 int i;
35
36 #pragma omp parallel for \
37 private(i) shared (a, n, h, integral)
38 for(i = 1; i < n; i++) {
39 integral += f(a+i*h);
40 }
41
42 integral = integral * h;
43 printf("With %d trapezoids, our estimate of the integral from \n", n);
44 printf("%f to %f is %f\n", a,b,integral);
45}
46
47double f(double x) {
48 return sin(x);
49}
Lines 36-37 contain the omp parallel for
pragma that parallelizes the loop that is computing the
integral. This example is different from others you have seen so far, in that the variables are explicitly declared to be
either private
(each thread gets its own copy) or shared
(all threads use the same copy of the variable in memory).
If our parallel program is implemented correctly, the program should estimate the area under the curve as 2.00, which would be identical to the output of the serial program.
First, run this program serially on 1 thread by typing
make
./trap-notworking 1
- The program works as expected (outputs answer 2.0)
- Correct! This program runs serially, so there is no issue.
- The program returns a different approximate value (but the same value with every run)
- Not quite. Did you run the program using 1 thread?
- The program returns a different approximate value every run.
- Not quite. Did you run the program using 1 thread?
Q-3: Run the OpenMP implementation of integration serially a few times. What do you discover?
Next run this code in parallel using 4 threads by typing:
make
./trap-notworking 4
As with other examples you have done so far, the command-line argument of 4
above indicates use four threads.
- The program works as expected (outputs answer 2.0)
- Not quite. Try running the program a few more times. What do you see?
- The program returns a different approximate value (but the same value with every run)
- Close, but not correct. Try running the program again. Do you really get the same answer?
- The program returns a different approximate value every run.
- Correct. Something is seriously wrong with our program!
Q-4: Run the OpenMP implementation of integration a few times. What do you discover?
- This scenario illustrates a classic limitation of certain types of parallelism; this is ane example of something that cannot be properly parallelized.
- Nope. This program CAN be properly parallelized. We are just doing something wrong!
- There is a race condition in this code.
- Correct! Do you know where it is?
- It's some other issue.
- Actually, the issue is listed in one of the options!
Q-5: Do you have any guesses on what could be causing the issue?
5.3 Parallel Integration - Second Attempt¶
We will use the previously discussed reduction
clause to fix the issue. Here is the improved version, called trap-omp-working.c
:
1#include <math.h>
2#include <stdio.h> // printf()
3#include <stdlib.h> // atoi()
4#include <omp.h> // OpenMP
5
6
7/* Demo program for OpenMP:
8computes trapezoidal approximation to an integral*/
9
10const double pi = 3.141592653589793238462643383079;
11
12int main(int argc, char** argv) {
13 /* Variables */
14 double a = 0.0, b = pi; /* limits of integration */;
15 int n = 1048576; /* number of subdivisions = 2^20 */
16 double h = (b - a) / n; /* width of subdivision */
17 double integral; /* accumulates answer */
18 int threadcnt = 1;
19
20 double f(double x);
21
22 /* parse command-line arg for number of threads */
23 if (argc > 1) {
24 threadcnt = atoi(argv[1]);
25 }
26
27#ifdef _OPENMP
28 omp_set_num_threads( threadcnt );
29 printf("OMP defined, threadct = %d\n", threadcnt);
30#else
31 printf("OMP not defined");
32#endif
33
34integral = (f(a) + f(b))/2.0;
35int i;
36
37#pragma omp parallel \
38for private(i) shared (a, n, h) reduction(+: integral)
39 for(i = 1; i < n; i++) {
40 integral += f(a+i*h);
41 }
42
43 integral = integral * h;
44 printf("With %d trapezoids, our estimate of the integral from \n", n);
45 printf("%f to %f is %f\n", a,b,integral);
46}
47
48double f(double x) {
49 return sin(x);
50}
As with the previous reduction patternlet example, we again have an accumulator variable in the loop, this time called
integral
. Thus our reduction clause reads reduction(+:integral)
.
The above program was compiled with our first use of the make
command. Run the above program using the following command:
./trap-working 4
Note
Because we did not explicitly add an additional static
or dynamic
clause to the pragma on lines 37-38 on the working
version above, the default behavior of assigning equal amounts of work doing consecutive iterations of the loop (i.e. static
scheduling) was used to decompose the problem onto threads. In this case, since the number of trapezoids used was 1048576, then
with 4 threads, thread 0 will do the first 1048576/4 trapezoids, thread 1 the next 1048576/4 trapezoids, and so on.