9.1 The basic syntax of Chapel for Shared Memory Systems¶
Let’s start by examining some examples for shared memory single CPUs with multiple threads.
How Parallel Patterns Apply¶
Because Chapel is designed from the ground up to enable parallelism within the language, certain patterns that we explored in Chapter 1 are now solidly contained inside the language constructs and keywords. Thus, tried and true methods for obtaining parallelism in our code are built right in. Similar to OpenMP, this includes the concept of SPMD, single program, multiple data program structure pattern. Unlike OpenMP, the fork-join pattern for threads in a shared memory system cannot be indicated in Chapel code, but is instead a natural part of the language.
Decisions about distributing tasks to cores are left to the implementation on the system. For example, how to carry our data decomposition among threads using a for loop in the shared memory system is left to the underlying running code.
What you also will notice is that parts of the lowest level patterns (nearest to the hardware) are hidden or inaccessible to us. An obvious example is that we will be unable to access a thread id and have any notion of what thread is working on any portion of data or executing a particular portion of a loop or completing a designated task.
When appropriate, we use the pattern terminology from Chapter 1 when introducing the code examples, so that you can relate them to OpenMP examples that you may have studied in Chapter 2.
Data decomposition and parallel for loop using keyword forall¶
Here’s our first simple example
System Message: ERROR/3 (/srv/web2py/applications/runestone/books/IntermediatePDC/_sources/9-ChapelIntro/1-firstSteps.rst, line 24)
Error in “activecode” directive: maximum 2 argument(s) allowed, 10 supplied.
.. activecode:: 9-forall
:language: pdc
:compiler: 'chpl'
:caption: Simple parallel for loop
forall i in 1..10
{
writeln(i);
}
// for i in 1..10
// {
// writeln(i);
// }
The first item to notice about this example is the keyword forall. This is the syntax for indicating that the loop should be split to run in parallel on more than one thread. The output shows you something interesting that you may already know is a hallmark of running threads on shared memory machines. Consider this:
- Nothing happens. It's the same output each time.
- Try again a few times.
- A number for the loop iteration index is sometimes repeated.
- This is not correct. You should see each value from 1 through 10 only once.
- The order of which print gets completed varies each time.
- Correct! Like all thread-based code on multicore CPUs, we cannot guarantee the order in which each forked thread's work finishes.
Q 9-1: What happens when you re-run the forall example loop several times?
The commented portion shows how to write a traditional sequential loop. Try uncommenting it and commenting out the forall loop and running it again.
The fact that this code ran as is tells us some other interesting things about the language that sets it apart from C and C++. We’ll discuss more about these later, but for now, hopefully you can notice the following:
Variable types can be inferred by the context. In this case i is inferred to be an integer type.
Loop syntax is similar to Python, though it indicates ranges differently and includes the last value of i, 10, when executing.
Printing variable values to the terminal doesn’t require elaborate formatting like C printf does.
This code compiled and ran without the use of a declaration of a function. In almost all cases we would not write examples like this when developing a program that is even slightly complex.