4.1. Message Passing Pattern: Key Problem¶
The following code represents a common error that many programmers have inadvertently placed in their code. The concept behind this program is that we wish to use communication between pairs of processes, like this:
For message passing to work between a pair of processes, one must send and the other must receive. If we wish to exchange data, then each process will need to perform both a send and a receive. The idea is that process 0 will send data to process 1, who will receive it from process 0. Process 1 will also send some data to process 0, who will receive it from process 1. Similarly, processes 2 and 3 will exchange messages: process 2 will send data to process 3, who will receive it from process 2. Process 3 will also send some data to process 2, who will receive it from process 3.
If we have more processes, we still want to pair up processes together to exchange messages. The mechanism for doing this is to know your process id. If your id is odd (1, 3 in the above diagram), you will send and receive from your neighbor whose id is id - 1. If your id is even (0, 2), you will send and receive from your neighbor whose id is id + 1. This should work even if we add more than 4 processes, as long as the number of processes is divisible by 2.
Warning
There is a problem with the following code called deadlock. This happens when every process is waiting on an action from another process. The program cannot complete. On linux systems such as the Raspberry Pi, type ctrl-c together to stop the program (ctrl means the control key).
Navigate to: ../04.messagePassingDeadlock/
Make and run the code on 4 MPI processes:
make
mpirun -hostfile ~/hostfile -np 4 ./messagePassingDeadlock
Here the 4 signifies the number of processes to start up in MPI.
Exercise:
Run this code with 4 processes. Observe the deadlock and stop the program using ctrl-c.
4.1.1. Explore the code¶
In this code, can you trace what is happening to cause the deadlock?
Each process, regardless of its id, will execute a receive request first. In this model, recv is a blocking function- it will not continue until it gets data from a send. So every process is blocked waiting to receive a message.
#include <stdio.h>
#include <mpi.h>
int odd(int number) { return number % 2; }
int main(int argc, char** argv) {
int id = -1, numProcesses = -1;
int sendValue = -1, receivedValue = -1;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &id);
MPI_Comm_size(MPI_COMM_WORLD, &numProcesses);
if (numProcesses > 1) {
sendValue = id;
if ( odd(id) ) { // odd processors receive from their 'left neighbor', then send
MPI_Recv(&receivedValue, 1, MPI_INT, id-1, 2,
MPI_COMM_WORLD, &status);
MPI_Send(&sendValue, 1, MPI_INT, id-1, 1, MPI_COMM_WORLD);
} else { // even processors receive from their 'right neighbor', then send
MPI_Recv(&receivedValue, 1, MPI_INT, id+1, 1,
MPI_COMM_WORLD, &status);
MPI_Send(&sendValue, 1, MPI_INT, id+1, 2, MPI_COMM_WORLD);
}
printf("Process %d of %d computed %d and received %d\n",
id, numProcesses, sendValue, receivedValue);
} else if ( !id) { // only process 0 does this part
printf("\nPlease run this program using -np N where N is positive and even.\n\n");
}
MPI_Finalize();
return 0;
}
Can you think of how to fix this problem?
Since recv is a blocking function, we need to have some processes send first, while others correspondingly recv first from those who send first. This provides coordinated exchanges.
Go to the next example to see the solution.