7.1 Sequential versions with 2 compilers¶
We will begin by illustrating that the same code file for vector addition can be compiled and run by different C compilers: gcc and pgcc. All of the examples in this chapter use the same functions for gathering command line arguments and for utility functions that initialize the vectors, print them, and check for correct results. These are in separate files in the code on the repository and are shown in code blocks below whose run button is disabled.
Note
The OpenACC compiler, pgcc, uses the .c suffix, because code files are treated as C code files that can be compiled with a C compiler. Though the pgcc compiler is creating CUDA code behind the scenes for versions that will run on the GPU, we now think of OpenACC code files as C code files with pragmas, much like OpenMP.
Command line argument handling¶
The following code block contains the functions for gathering the command line arguments for our vector addition program. It uses a C library function call getopt(), which enables us to use syntax like this when executing this from the command line on your own machine:
./vectorAdd -n 1024
The getopt() function is used on line 19 below. If you have not used this before, you should be able to find tutorials on the web for it.
Helper functions used by each example¶
The following functions are used for each different main program function that you find here and in the sections following this one.
Sequential main program: gcc compiler¶
The main program below is compiled by including the above two code blocks. As you can see below the code, there is a place for you to change the command line arguments, as we have seen in other examples in this book and the PDC for Beginners book. There is also now a box where we expose the compiler arguments that in this case are sent to the gcc compiler. Run it first with all of these values given to you.
Notes:
Note that for this code, if the number of values in each array is less than 40, the arrays will be printed so that you can verify visually what is in them and that it is added them correctly. You can experiment with the ‘10’ n the command line arguments to illustrate this. The arrays are initialized so that each value in array x is 1.0 and each value in array y is 2.0. This makes it straightforward to sheck whether the results are correct.
Since this is a sequential version, the option for number of threads is ignored.
The default compiler flag provided is -O2 (capital O, not zero) for a somewhat fast code optimization level that is close to the one used by the pgcc compiler below. To generate a compiler error you could try adding ,’-foo’ after ‘-O2’.
Not shown but used is a linker argument, -lm, for math library functions used to check that the result is correct.
Exercise:
Try running with a significantly larger number of elements in each array. Note that in this simple case the main function always checks whether the result is correct. You can also remove the ‘-n’,’10’ completely from the command line arguments and use the default array size set in the code.
Sequential main program: pgcc compiler¶
Now we will use the new pgcc compiler for this code. Note below the change in the compiler arguments for this compiler. In particular, notice that the compiler directive -acc=host is used to indicate that sequential code for the host’s CPU should be generated for the code blocks where OpenACC pragmas appear. In addition, the flag for generating optimized code is -fast (without the O needed for gcc), and the -Minfo=opt is used to profide some information about the code optimization.
The main program is compiled by including the two code blocks for command line arguments and helper functions. You can run this one to see how the new pgcc compiler runs the program and creates compiler output that is explained below.
Here is how to interpret the output from running this version:
The output from the program comes first. Note how it is the same as the default gcc version above.
After the program output, there is a line that looks like this: ===== STANDARD ERROR =====. What follows this is the output from the pgcc compiler. We indicated that we wanted this output by using the compiler flag ‘-Minfo=opt’. This pgcc compiler provides this option so that you can see the optimizations the compiler used as a result of including the -fast option. Look at the Wikipedia page for loop unrolling to see a discussion of this technique, which was the optimization used here.
As with the previous gcc version, try larger array sizes.
Same code, two compilers¶
This example uses the same code, but illustrates slight differences in the flags used by each compiler. Each compiler produces different machine code.
We will next look at how the pgcc compiler can generate machine code for a file with OpenMP pragmas, then follow on with a different multicore version and ulimately a GPU version.
Note
The pgcc/nvc compiler documentation indicates that the compiler flag -fast is roughly equivalent to -O2. Each compiler is different, however, and generates different executable code. Using higher optimization, such as -O3, sometimes makes the code faster, but sometimes not. You always have to run experiments to find out.