CHAPTER 5: Integration using the Trapezoidal Rule¶

5.1 Estimating the Area Under a Curve¶

As our next example, let’s look at the problem of estimating the area under a curve. If you have taken a calculus course, you may recognize this problem as the Riemann sum or Trapezoidal Rule, which approximates the area under the curve (i.e. the integral) by splitting the area under the curve into a series of trapezoids.

In the following programs, we attempt to use the trapezoid rule to approximate the integral

\[\int_0^{\pi} sin(x)_{dx}\]

using \(2^{20}\) equal subdivisions. The answer from this computation should be 2.0. The following video shows how a single thread would solve this problem:

In this example, the single thread serially computes the area of each trapezoid and adds all the trapezoids together to one value.

A C implementation of this program may look like the following:

#include <stdio.h>
#include <math.h>
#include <stdlib.h>

//function we are calculating the area under
double f(double x) {
    return sin(x);
}

//estimation of pi
const double pi = 3.141592653589793238462643383079;

int main(int argc, char** argv) {
    //Variables
    double a = 0.0, b = pi;         //limits of integration
    int n = 1048576;                //number of subdivisions = 2^20
    double h = (b - a) / n;         //width of each subdivision
    double integral;                // accumulates answer

    integral = (f(a) + f(b))/2.0;  //initial value

    //sum up all the trapezoids
    int i;
    for(i = 1; i < n; i++) {
        integral += f(a+i*h);
    }

    integral = integral * h;
    printf("With %d trapezoids, our estimate of the integral from \n", n);
    printf("%f to %f is %f\n", a,b,integral);

    return 0;
}

Let’s now consider how we can use multiple threads to approximate the area under the curve in parallel. One strategy would be to assign each thread a subset of the total set of subdivisions, so that each thread separately computes its assigned set of trapezoids.

The following video illustrates how 4 threads would work together to approximate the area under the curve:

5.2 Parallel Integration - First Attempt¶

Let’s visit the integration directory by using the following command (note that this directory is on the same level as the patternlets directory) in the reduction directory:

cd ../../integration

The following program attempts to update the serial version of the integration program with OpenMP pragmas:

#include <math.h>
#include <stdio.h> // printf()
#include <stdlib.h> // atoi()
#include <omp.h> // OpenMP

/* Demo program for OpenMP: 
computes trapezoidal approximation to an integral*/

const double pi = 3.141592653589793238462643383079;

int main(int argc, char** argv) {
    /* Variables */
    double a = 0.0, b = pi; /* limits of integration */;
    int n = 1048576; /* number of subdivisions = 2^20 */
    double h = (b - a) / n; /* width of subdivision */
    double integral; /* accumulates answer */
    int threadcnt = 1;

    double f(double x);

    /* parse command-line arg for number of threads */
    if (argc > 1) {
        threadcnt = atoi(argv[1]);
    }

    #ifdef _OPENMP
        omp_set_num_threads( threadcnt );
        printf("OMP defined, threadct = %d\n", threadcnt);
    #else
        printf("OMP not defined");
    #endif

    integral = (f(a) + f(b))/2.0;
    int i;

    #pragma omp parallel for \
    private(i) shared (a, n, h, integral)
    for(i = 1; i < n; i++) {
        integral += f(a+i*h);
    }

    integral = integral * h;
    printf("With %d trapezoids, our estimate of the integral from \n", n);
    printf("%f to %f is %f\n", a,b,integral);
}

double f(double x) {
    return sin(x);
}

Lines 36-37 contain the omp parallel for pragma that parallelizes the loop that is computing the integral. This example is different from others you have seen so far, in that the variables are explicitly declared to be either private (each thread gets its own copy) or shared (all threads use the same copy of the variable in memory).

If our parallel program is implemented correctly, the program should estimate the area under the curve as 2.00, which would be identical to the output of the serial program.

First, run this program serially on 1 thread by typing

make
./trap-notworking 1

Q-3: Run the OpenMP implementation of integration serially a few times. What do you discover?

The program works as expected (outputs answer 2.0)
Correct! This program runs serially, so there is no issue.
The program returns a different approximate value (but the same value with every run)
Not quite. Did you run the program using 1 thread?
The program returns a different approximate value every run.
Not quite. Did you run the program using 1 thread?

Next run this code in parallel using 4 threads by typing:

make
./trap-notworking 4

As with other examples you have done so far, the command-line argument of 4 above indicates use four threads.

Q-4: Run the OpenMP implementation of integration a few times. What do you discover?

The program works as expected (outputs answer 2.0)
Not quite. Try running the program a few more times. What do you see?
The program returns a different approximate value (but the same value with every run)
Close, but not correct. Try running the program again. Do you really get the same answer?
The program returns a different approximate value every run.
Correct. Something is seriously wrong with our program!

Q-5: Do you have any guesses on what could be causing the issue?

This scenario illustrates a classic limitation of certain types of parallelism; this is ane example of something that cannot be properly parallelized.
Nope. This program CAN be properly parallelized. We are just doing something wrong!
There is a race condition in this code.
Correct! Do you know where it is?
It's some other issue.
Actually, the issue is listed in one of the options!

5.3 Parallel Integration - Second Attempt¶

We will use the previously discussed reduction clause to fix the issue. Here is the improved version, called trap-omp-working.c:

#include <math.h>
#include <stdio.h> // printf()
#include <stdlib.h> // atoi()
#include <omp.h> // OpenMP


/* Demo program for OpenMP:
computes trapezoidal approximation to an integral*/

const double pi = 3.141592653589793238462643383079;

int main(int argc, char** argv) {
    /* Variables */
    double a = 0.0, b = pi; /* limits of integration */;
    int n = 1048576; /* number of subdivisions = 2^20 */
    double h = (b - a) / n; /* width of subdivision */
    double integral; /* accumulates answer */
    int threadcnt = 1;

    double f(double x);

    /* parse command-line arg for number of threads */
    if (argc > 1) {
        threadcnt = atoi(argv[1]);
    }

#ifdef _OPENMP
    omp_set_num_threads( threadcnt );
    printf("OMP defined, threadct = %d\n", threadcnt);
#else
    printf("OMP not defined");
#endif

integral = (f(a) + f(b))/2.0;
int i;

#pragma omp parallel \
for private(i) shared (a, n, h) reduction(+: integral)
    for(i = 1; i < n; i++) {
        integral += f(a+i*h);
    }

    integral = integral * h;
    printf("With %d trapezoids, our estimate of the integral from \n", n);
    printf("%f to %f is %f\n", a,b,integral);
}

double f(double x) {
    return sin(x);
}

As with the previous reduction patternlet example, we again have an accumulator variable in the loop, this time called integral. Thus our reduction clause reads reduction(+:integral).

The above program was compiled with our first use of the make command. Run the above program using the following command:

./trap-working 4

Note

Because we did not explicitly add an additional static or dynamic clause to the pragma on lines 37-38 on the working version above, the default behavior of assigning equal amounts of work doing consecutive iterations of the loop (i.e. static scheduling) was used to decompose the problem onto threads. In this case, since the number of trapezoids used was 1048576, then with 4 threads, thread 0 will do the first 1048576/4 trapezoids, thread 1 the next 1048576/4 trapezoids, and so on.

You have attempted of activities on this page