Hons/Masters research project proposal:
Characterization of Application Performance of Cluster Computers

supervisor: Dr Peter Strazdins

suitable for project courses: COMP4005 (Honours), COMP4540, COMP6720/02, COMP8800; COMP8750; COMP6703/COMP3750 - infrastructure part only

A Beowulf-style cluster computer is a parallel computer using Commercial-off-the-Shelf switch-based network to communicate between the processors. The ANU Beowulf cluster Bunyip is such a cluster based on Fast (100Mb) Ethernet switches. Clusters have proved a highly cost-effective high performance computer model, and have largely displaced the traditional massively parallel computers built with expensive vendor-supplied networks. However, the COTS networks' raw communication speed has not kept up with the dramatic increases in processor speed, and provides a limit to the performance of many applications on clusters.

Applications written for clusters use the Message Passing Interface (MPI) to both synchronize and communicate data between its processes running on different cluster nodes. Costs of communication include both latency (the time to send a small message) and bandwidth (the rate per byte of data transfer, for larger messages).

An important aspect of research into clusters is then the characterization of applications' performance. At the lowest level: is performance limited by CPU, memory access, latency or bandwidth? For applications dominated by communication aspects, finer questions, such as can the application benefit from networks supporting bi-directional communication, and what is its potential for overlapping communication with computation, can be addressed. This information can be obtained from the literature (on the applications and benchmarks concerned), study of the application's source code and by experimentation. Of particular relevance is the paper on the characterization of the NAS parallel benchmarks listed below.

This project will investigate this issue. It has a natural synergy with a sister project, which was undertaken by MIT student Tony Breeds in semester 1 2006, and will build upon the work in Tony's thesis. Instrumenting the MPI library and the use of performance counter libraries (which give access to hardware event counts) will be part of the new infrastructure developed by this project. Together, these projects will develop a Performance Evaluation Methodology, targeted at the Jabberwocky multicluster, which was installed in DCS in early 2006.

References

See the links above, and also:
Last Modified: Peter Strazdins, 12 Dec 2007