7.3 Fast multicore code with new pgcc compiler

Now we will see our first example of new pragmas that are defined in the OpenACC standard.

Same Command line and helper functions as before

The OpenACC multicore code

The pgcc compiler can use OpenACC pragmas to compile a multicore version of the code that runs on the CPU. Note in the function called CPUadd below, the pragma line is slightly different from those we have seen for OpenMP.

Notice how in this case we start with a clause called num_gangs() that is equivalent to using num_threads() in OpenMP pragmas. Most of our OpenMP examples so far used the omp_set_num_threads() function in main instead.

As you try running this code, We describe below how you can eliminate the use of the num_gangs() clause and simply let the compiler use the most cores available on the machine you have compiled the code on.

Note that we still use the OpenMP function omp_get_wtime() to help use time the code.

Note the compiler arguments shown below the code. We expose these so that you can see what is needed to compile a multicore version using pgcc. In particular these two new compiler flags:

  • -acc=multicore : signifies that code should be built to use multiple cores and therefore multiple threads, as indicated before blocks with #pragma acc parallel

  • -Minfo=opt : this enables additional information to be reported by the compiler about how the code was optimized.

Added compiler information

As with prior examples of using pgcc, after the program output, there is a line that looks like this: ===== STANDARD ERROR =====. What follows this is the output from the pgcc compiler. We indicated that we wanted this output by using the compiler flag ‘-Minfo=opt’. This pgcc compiler provides this option so that you can see the optimizations the compiler used as a result of including the -fast option.

Exercises

  • Try varying the number of threads: Remove the ‘-n’, ‘10’ from the command line arguments and try each of these: [‘-t’, ‘2’] and [‘-t’, ‘4’] and [‘-t’, ‘8’] to see how you gain some improvement in the running time.

  • What do you notice about the running times of this version versus the two in the previous section?

  • Change the code to let the compiler chose the maximum number of threads: In the CPUadd() function, look for the pragma line, which is labeled A. in the comment behind it. Comment out this entire pragma and uncomment the line below it with a slightly different pragma, labeled B. in the comment behind it. In this case, the compiler will ignore any thread settings you indicate on the command line and use the number of cores on the machine it is compiled on.

Which is best?

We once again see that it pays to experiment with different compilers and find out what works best for your application. For pgcc and OpenACC compilers, the decision of the compiler writers is to use the most threads that can be run on available cores on your machine. In some cases this turns out to be the best way to use their multicore version, since if you move the code from one machine to another and recompile, it can take advantage of what you have available. You often simply want to do this for your application and get the most out of the resources you have. You have to be careful, though, and make sure that your performance isn’t degrading by using too many threads.

You have attempted of activities on this page