Research/Implementation project proposal:
Automatic Tuning of the High Performance Linpack Benchmark

Appropriate Courses: COMP3750 (CSys), COMP8740(AI), COMP8750 (CSys), COMP8770 (eSci), COMP8790 (SE);
research version: COMP4005 (Hons), COMP4540 (SE RP), COMP8800 (MCOMP Hons)
Status: proposition Student:
Supervisors: Dr Peter Strazdins, with Dr Alistair Rendell and Warren Armstrong (PhD student)
Research Area(s): parallel computing, automatically-tuned software, high performance computing
Technical Difficulty Level: moderate
Conceptual Difficulty Level: moderate

Description

Supercomputers today are large-scale parallel processors, and the Top 500 list is the most definitive ranking of the world's fastest. While the status of this list is contentious, the list is based on the results of a single benchmark - the Linpack benchmark - which in this case solves a (very) large dense linear system. The list is dominated by Beowulf-style cluster computers: a parallel computer using (typically Commercial-off-the-Shelf) switch-based network to communicate between the processors.

There is a publically available open-source code for a highly optimized benchmark called High Performance Linpack (HPL). It is also highly tunable, but the tuning currently must be done by hand. This is problematic, in the sense that it requires both knowledge if the complex algorithms used in HPL (beyond the understanding of most people who wish to use the benchmarks) and of the (cluster) supercomputer itself. Furthermore, in the case of clusters, this problem is exacerbated by the fact that the cluster may be non-homogeneous (so the benchmark results may depend on which parts of the cluster are selected), and by the fact that by their nature clusters may be easily reconfigured. Thus, it is highly desirable that this tuning process can be automated.

The LAPACK For Clusters (LFC) project went partway in addressing this, in that some work on determining, given p processors, to the best logical P x Q logical processor configuration, where P*Q <= p was made. However, automatically choosing the critical blocking parameter and a plethora of algorithm variants remains to be done.

This project will investigate automatic tuning methods for the HPL package for cluster computers. The search can be done statically, and employ efficient empirical methods based on techniques such as the Nelder-Mead simplex search, as is done for the ATLAS package (which itself can be used as a component of HPL).

The implementation version iof the project will begin by providing an implementation using simple empirical methods with an evaluation. The research version would investigate a variety of more sophisticated methods, and would evaluate their suitability in this context. The DCS Saratoga cluster will form a suitable experimental platforms. The project has the potential for publishable work and for the production of widely used open-source software.

References

See the links above, and also:

The IEEE Task Force on Cluster Computing home page.
The ANU Beowulf home page. There is a also paper on solving dense linear systems on this platform (exposing some problems with HPL)
Dongarra, J et al. Self Adapting Numerical Software (SANS) Effort, IBM Journal of Research and Development 50(2-3), 2006.

Last Modified: Peter Strazdins, Feb 2008

Research/Implementation project proposal: Automatic Tuning of the High Performance Linpack Benchmark

Description

References

Research/Implementation project proposal:
Automatic Tuning of the High Performance Linpack Benchmark