|
The implementation code implements the algorithms described
in the paper to run the three operations (All-Gather, All-Reduce,
Reduce-Scatter). The code is written in C and has been tested
on LAM/MPI. The code can be downloaded as a tar-gzipped
(.tar.gz) - 200 kB file or a zipped
(.zip) - 201 kB file.
|
|
|
The usage instructions are for LAM/MPI but it shouldn't be
too much problem figuring it out for other MPI distributions.
- Compile all the programs using hcc. Example 'hcc -Wall -o
combine.exe combine.c calargs.c gather.c reduce.c redscat.c
timediff.c verify.c'. Or alternatively just run the provided
makefile.
- Once compiled, run the program on the system using mpirun.
The program is followed by arguments (minimum argument '-e')
followed by the algorithm and the operation. Example 'mpirun
-c 8 combine.exe -e1000 -t1000x10 -i10 -v ring gather'.
- The following arguments are accepted:
- -e#: Number of Elements: defines the number of starting
elements used by the program.
- -i#: Number of Iterations: defines the number of iterations
for each element size.
- -t#x#: Use Time Steps, Size of Increment, Number of
Increments: Allows increasing number of elements to be
tested. The first variable is the size of the increment
and the second the number of steps to increase the original
element size by.
- -p[#]: Create a printable format, 0=mintime, 1=avgtime,
2=maxtime: Creates a printable format to pipe into a graph
program. '-p0' shows minimum time, '-p1' average time
and '-p2' maximum time. Default value is average time.
- -o[#]: Print results from node # (default 0): Used mainly
for debugging. Prints the final element values for a particular
node.
- -c: Intersperse with computation (obsolete): This argument
is obsolete.
- -r: Use values of 10 as input elements: Used mainly
for debugging. Initialises each node's original element
value to 10. Default is 2 to the power of the node's rank.
- -b: Use blocking send/recv (obsolete): This argument
is obsolete.
- -v: Verify routine: Runs a verification program before
the actual test. This program initalises random element
values and tests the custom implementation with the MPI
implementation. If the results are different, an error
is produced.
- Once the arguments are defined, the algorithm is called.
The following algorithms are supported:
- ring - Ring algorithm
- bidir - Bi-Directional Exchange algorithm.
- bing - Full Fan-in algorithm.
- naive - Full Fan-out algorithm.
- mpi - MPI Implementation.
- fan - Fan-in Fan-out algorithm.
- rec - Recursive-halving Recursive-doubling algorithm
- pipe - Pipeline algorithm.
- pipe2 - Repeated Binary Tree algorithm.
- The last argument to define is the operation. The following
operations are supported:
- gather - MPI All-Gather operation.
- reduce - MPI All-Reduce operation.
- redscat - MPI Reduce-Scatter operation.
|
|
|
The files used in program are described as:
- combine.c - The main program. combine.c reads the arguments
and calls the appropriate programs.
- calargs.c - Calculate arguments. calargs.c calculates the
arguments and translates the values into a table called args.
- timediff.c - Time difference program. Used to calculate
the time difference between two struct tv values.
- gather.c - All-Gather implementations. The functions which
run the algorithms supporting All-Gather.
- reduce.c - All-Reduce implementations. The functions which
run the algorithms supporting All-Reduce.
- redscat.c - Reduce-Scatter implementations. The functions
which run the algorithms supporting Reduce-Scatter.
- combine.h - Header file. Header file containing the definitions.
|
|
Back to Main
|
|