Hons/MIT project proposal:
Methodologies for Network-level Optimization of Cluster Computers

supervisors: Dr Peter Strazdins

A Beowulf-style cluster computer is a parallel computer using Commercial-off-the-Shelf switch-based network to communicate between the processors. The ANU Beowulf cluster Bunyip is such a cluster based on Fast (100Mb) Ethernet switches. Clusters have proved a highly cost-effective high performance computer model, and have largely displaced the traditional massively parallel computers built with expensive vendor-supplied networks. However, the COTS networks' raw communication speed has not kept up with the dramatic increases in processor speed, and provides a limit to the performance of many applications on clusters. It is therefore important to utilize all of the possible hardware capabilities of these networks in order to effectively increase communication speed.

The switch-based networks used by clusters not only support normal point-to-point messaging (via TCP/IP in the case of Ethernet networks), but data transfers by collective communications (e.g. multicasts) and remote direct memory access (RDMA). The nodes on a cluster are connected to these switches via Network Interface Cards (NICs); it is also possible for nodes to have multiple connections with the network and speed up data transfer by splitting a single message and sending parts of the message of each connection simultaneously.

In partnership with the local company Alexander Technology, this project will investigate evaluating the effectiveness of various hardware and software configuration options for various networks. These include Gigabit Ethernet (the PCI-X based Intel Pro/1000 MT) and Infiniband (PathScale HTX). These will be used to connect a small dual CPU Opteron cluster.

Hardware configuration options include using multiple NICS (channels). Software configuration options include the use of multiple TCP/IP stacks and network interfaces (in conjunction with multiple NICS). In the case of Infinband, RDMA communication can also be considered.

Performance evaluation would cover both micro-benchmarks (e.g. ping-pong message exchange) and those derived from real-world applications written in MPI, such as the NAS Parallel Benchmarks. From this experience, a systematic methodology would be proposed.

Time permitting, other aspects of communication performance on these networks may be considered. This includes improvement of collective communications such as TCP/IP multicast and NIC-level optimizations. This will involve the search for, installation and evaluation of existing (open source) software for this purpose. Depending on time and opportunity, new optimizations will be attempted and evaluated.

These projects are of strategic interest not only to DCS but Alexander Technology, who is rapidly expanding its research and development in cluster technology.

References

See the links above, and also:
Last Modified: Peter Strazdins, 14 Jul 2005