Introduction to Parallel Programming
A parallel program is one that runs simultaneously on multiple processors with some form of inter-process communication.
Parallel programming can be done in the following ways:
- Message Passing Interface (MPI)MPI (MPI-1 and MPI-2) are the standard APIs for message passing. MPI-2 extends MPI-1. Message passing is normally used by programs run on a set of computing systems (such as the nodes in a cluster), each of which has its own memory. The nodes are linked by a communication network, either Ethernet, InfiniBand (IB) or Myrinet, is such a way that the communication speed is the same between any pair of nodes.
With MPI programs, interactions between processes is achieved through an exchange of messages.
On the Hoffman2 Cluster we use the following implementations of MPI: MPICH and MPICH2 from Argonne National Lab for Ethernet, MVAPICH1 and MVAPICH2 from Ohio State University for Infiband, OpenMPI from MPI forum(Open Source contribution from industry and academics) for Infiniband as well as Ethernet and, the MPI fromMyricom, for Myrinet.
- OpenMP via Compiler DirectivesModern Fortran and C/C++ compilers can parallelize a program for shared memory execution if you add the appropriate OpenMP compiler directives to your code.
With OpenMP programs, interactions between processes is achieved through reading/writing to a shared memory. Therefore, OpenMP parallel programs will run only on the processors of a single node.
On clusters, OpenMP is limiting because, while a cluster may have hundreds of nodes, currently each node has only a small number of processes. This is changing rapidly as the number of cores per node keeps increasing.
- Combining OpenMP Compiler Directives with MPICombining OpenMP compiler directives with MPI allows you to run on many more processors than you can when you use OpenMP directives by themselves.
- Using a Thread Libirary and writing Thread CodeWhen a program forks a thread or threads, it splits itself into two or more simultanoeouly running parts. Writing thread code is a commonly used method to allow one part of a program to keep executing while another part waits on an I/O operation or performs some other long-running task. When a threaded program is run on a single processor, the operating system monitors the threads and schedules a single thread at a time to run on the processor. A program with n threads can be run on a node with m processors, n >= m, and the operating system will schedule the threads to run on the processors as appropriate.
On clusters, using threads can only achieve limited parallelization because all the threads run on a single node.
A number of the application programs have been parallelized by the program’s vendor. You can run these applications and take advantage of the speed up provided by parallel programming without writing any parallel code yourself.
On the Hoffman2 Cluster, there are parallel versions of MATLAB, Q-Chem, Amber, Gaussian, and other programs that you can run.
Some kinds large programming problems can best be handled by workflows rather than by parallel programs. These include embarrassingly parallel problems in which either:
- the data can be divided up into independent discrete units, each of which can be computed independently of the others without inter-process communications or
- the problem can broken down into a sequence steps
Use the Job Arrays feature of the Univa Grid Engine (UGE) or the Multi-Job service of the UCLA Grid Portal to submit any number of serial jobs in which the program is the same but the data varies. A series of runs of a serial program, forming a parametric study or an embarassingly parallel application runs well this way.
With Service Oriented Architecture (SOA) services are packaged in discrete units that are distributed over the network. Parallelism is achieved by creating workflows that make use of many services to solve the problem at hand.
SOA workflows are currently not an efficient way to do high performance computing. However, they can be used in conjunction with traditional MPI parallel programs for tasks such as moving output files around, running post processing programs, etc.
On the Hoffman2 Cluster, parallel programs can use MPI in the following ways:
- Fortran, C, and C++ programs can use the MPI-1 or MPI-2 API. We recommend MPI-2 as MPI-1 is now deprecated. There are two implemenations: MPICH and MPICH2 and OpenMPI that can be used depending upon the cluster.
- A program can call subroutines/functions from a parallelized library. On the Hoffman2 Cluster Fortran programs can call routines from scaLAPACK, a library of high-performance linear algebra routines that make use of MPI for message passing.
- An Introduction to MPI from Argonne National Lab.
- Introduction to MPI from NERSC.
- Message Passing Interface (MPI)from LLNL.
- Pratical MPI Programming from IBM
- MPI — The Complete Reference.