Analysis and Optimisation of Communication Patterns on a Beowulf   Wi Bing Tan and Peter Strazdins

Updated: 26th August 2002

Implementation

The implementation code implements the algorithms described in the paper to run the three operations (All-Gather, All-Reduce, Reduce-Scatter). The code is written in C and has been tested on LAM/MPI. The code can be downloaded as a tar-gzipped (.tar.gz) - 200 kB file or a zipped (.zip) - 201 kB file.


Usage

The usage instructions are for LAM/MPI but it shouldn't be too much problem figuring it out for other MPI distributions.

  • Compile all the programs using hcc. Example 'hcc -Wall -o combine.exe combine.c calargs.c gather.c reduce.c redscat.c timediff.c verify.c'. Or alternatively just run the provided makefile.
  • Once compiled, run the program on the system using mpirun. The program is followed by arguments (minimum argument '-e') followed by the algorithm and the operation. Example 'mpirun -c 8 combine.exe -e1000 -t1000x10 -i10 -v ring gather'.
  • The following arguments are accepted:
    • -e#: Number of Elements: defines the number of starting elements used by the program.
    • -i#: Number of Iterations: defines the number of iterations for each element size.
    • -t#x#: Use Time Steps, Size of Increment, Number of Increments: Allows increasing number of elements to be tested. The first variable is the size of the increment and the second the number of steps to increase the original element size by.
    • -p[#]: Create a printable format, 0=mintime, 1=avgtime, 2=maxtime: Creates a printable format to pipe into a graph program. '-p0' shows minimum time, '-p1' average time and '-p2' maximum time. Default value is average time.
    • -o[#]: Print results from node # (default 0): Used mainly for debugging. Prints the final element values for a particular node.
    • -c: Intersperse with computation (obsolete): This argument is obsolete.
    • -r: Use values of 10 as input elements: Used mainly for debugging. Initialises each node's original element value to 10. Default is 2 to the power of the node's rank.
    • -b: Use blocking send/recv (obsolete): This argument is obsolete.
    • -v: Verify routine: Runs a verification program before the actual test. This program initalises random element values and tests the custom implementation with the MPI implementation. If the results are different, an error is produced.
  • Once the arguments are defined, the algorithm is called. The following algorithms are supported:
    • ring - Ring algorithm
    • bidir - Bi-Directional Exchange algorithm.
    • bing - Full Fan-in algorithm.
    • naive - Full Fan-out algorithm.
    • mpi - MPI Implementation.
    • fan - Fan-in Fan-out algorithm.
    • rec - Recursive-halving Recursive-doubling algorithm
    • pipe - Pipeline algorithm.
    • pipe2 - Repeated Binary Tree algorithm.
  • The last argument to define is the operation. The following operations are supported:
    • gather - MPI All-Gather operation.
    • reduce - MPI All-Reduce operation.
    • redscat - MPI Reduce-Scatter operation.

File Description

The files used in program are described as:

  • combine.c - The main program. combine.c reads the arguments and calls the appropriate programs.
  • calargs.c - Calculate arguments. calargs.c calculates the arguments and translates the values into a table called args.
  • timediff.c - Time difference program. Used to calculate the time difference between two struct tv values.
  • gather.c - All-Gather implementations. The functions which run the algorithms supporting All-Gather.
  • reduce.c - All-Reduce implementations. The functions which run the algorithms supporting All-Reduce.
  • redscat.c - Reduce-Scatter implementations. The functions which run the algorithms supporting Reduce-Scatter.
  • combine.h - Header file. Header file containing the definitions.

Back to Main