8.1. Drug design simulation¶
This is a very rough simulation of a program to compute how well a set of short protein ligands (each a possible drug) matches a given longer protein string. In the real software programs that do this, the matching is quite sophisticated, targeting possible ‘receptor’ sites on the protein.
Here is an image from Wikipedia illustrating the concept of the ligand (represented by small sticks in center) binding to areas of the protein (represented by ribbon structure):
For the real versions of this code and our simulated case, the longer the ligand or the longer the protein, the longer it takes for the matching and score of the match to complete.
We have created a default fake protein in the code. This can be changed on the command line.
We create the list of possible ligands in 2 ways:
If the number of ligands is <= 18, the list of ligands comes from a fabricated list that is hard-coded in the code. We designed this as an example with a range of ligands whose length was from 2 through 6.
If the number of ligands is > 18, the ligands are generated randomly using a gamma distribution that looks like this:
This means that we will have more ligands of length 2 or 3 and fewer of 4, 5, and 6, which are each declining in number. This has no relation to the real problem, but instead gives us some ligands that can be computed in a reasonable amount of time on a small cluster of single board computers. This way you can try some experiments and be able to see the advantage of one implementation over the other.
The image of the above Gamma distribution of lengths of ligands came from: here, where we used a = 4.2 an b = 0.8.
8.1.1. The two message-passing versions¶
To view the associated code for the drug design exemplar, visit the drug-design-mpi
folder under the MPI_examples
directory:
cd CSinParallel/Exemplars/MPI_examples/drug-design-mpi
The folder called drug-design-mpi
contains two versions of a simple model for matching ligands to proteins. Each of these versions uses these patterns from the MPI patternlets:
SPMD, Single program, multiple data
conductor-worker
message passing
The difference between the versions is as follows:
dd_mpi_equal_chunks.cpp
: The conductor divides the ligands from the list equally among each worker process, sending all the work at once and waiting for results from each worker. This uses the parallel loop equal chunks pattern.dd_mpi_dynamic.cpp
: The conductor sends ligands one at a time to each worker, giving them new work once they have completed another one. This uses the dynamic load balance pattern.
Take a look at each example in an editor to see if you can tell how it works. Note that each one uses a set of core functions found in a file called dd_functions.cpp
. If you would like to simply observe how these two versions perform, you could continue on at this point, run the code, and treat the programs like a ‘black box’. Then you could study the code further later.
8.1.2. Running Experiments on the Raspberry Pi cluster¶
Let’s try running this example on our Raspberry Pi cluster.
First compile these examples by simply executing make inside the CSinParallel/Exemplars/MPI_examples/drug-design-mpi folder.
The following commands will run the two files on a fixed set of 18 ligands:
mpirun -np 4 -hostfile ~/hostfile ./dd_mpi_equal_chunks -n 18 --verbose
mpirun -np 4 -hostfile ~/hostfile ./dd_mpi_dynamic -n 18 --verbose
From here you should be able to trace how the two programs are working.
You can get all the options like this:
mpirun -np 2 ./dd_mpi_equal_chunks --help
mpirun -np 2 ./dd_mpi_dynamic.py --help
8.1.2.1. Observe differences and performance improvement¶
The first point to notice about this example is that the work of each worker varies, so the time to compute a score for a ligand varies. This means we can experiment to see how assigning equal amounts of ligands per worker compares to dynamically assigning each ligand to the next worker that has finished.
Run each of the following cases as you did above, but changing the number of ligands. Start like this, where the 24 is for 24 ligands:
Equal Chunks:
mpirun -np 4 -hostfile ~/hostfile ./dd_mpi_equal_chunks -n 24
Dynamic:
mpirun -np 4 -hostfile ~/hostfile ./dd_mpi_dynamic -n 24
Fill in the following table, where you run these again, changing the 12 to 18, then 24, then 30. Be patient for the longer ones.
-np |
#ligands |
equal chunks time |
dynamic time |
---|---|---|---|
4 |
24 |
||
4 |
48 |
||
4 |
96 |
||
4 |
192 |
What do you observe about the difference between the two versions?
It is important to realize that these results might not always follow this pattern, depending on the number of ligands you try and the size of the simulated protein string you use. It can happen that over a larger amount of overall work, the ‘equal chunks’ version works just as well as the dynamic version. (With more work, on average, the two versions may take about the same time; with a really small amount of work per process this can also happen.)
Also note that the ligands are being created randomly, so each time you run either program with a particular number of ligands you will likely see different times.
8.1.2.2. Observing Scalability¶
Scalability is an important measure of parallel programs. To observe scalability, you can try either program with a fixed number of ligands, varying the number of processes like this: 2 (1 worker), 3 (2 workers), 5 (4 workers), 9 (8 workers).
-np |
#ligands |
# workers |
equal chunks time |
dynamic time |
---|---|---|---|---|
2 |
24 |
1 |
||
3 |
24 |
2 |
||
5 |
24 |
4 |
||
9 |
24 |
8 |
What do you observe about the time as you double the number of workers?
Ideally, as you double the number of workers, the time should be cut in half. This is called strong scalability. But there is some overhead from the message passing, so we don’t see perfect strong scalability.
Can you explain the difference between each version, especially as the number of workers increases?
You could also try a smaller number of ligands and observe when it is no longer beneficial to use any more workers in either of these cases. Try filling in this table:
-np |
#ligands |
# workers |
equal chunks time |
dynamic time |
---|---|---|---|---|
2 |
18 |
1 |
||
3 |
18 |
2 |
||
5 |
18 |
4 |
||
9 |
18 |
8 |
||
12 |
18 |
11 |
What observations can you make now?
8.1.3. Improvements to try¶
If you wish to experiment further, you could study the code in detail and try the following changes to the code. However, be sure to try the next example first and then come back to this.
A possible improvement to the each version of the code, but in particular the equal chunks version, would be to have the workers send only their highest values back. This way the conductor does less work. In the dynamic version, a differently tagged message would be needed to indicate a request for more work from the worker, so this may not really save any time, because the number of messages would be the same.
Another possibility is to have the workers send all of their scores in one array of ints when their all of their work is complete. The conductor could still handle the computation of the maximum value. This eliminates sending many small messages each containing one ligand score to the conductor.
In the dynamic version, it would be useful to report how many ligands each worker completes. You could then investigate more carefully when this version becomes more like the equal chunks version.