Honours project proposal:
Efficient Task Scheduling on Cluster Computers

supervisors: Dr Chris Johnson and Dr Peter Strazdins

This project has been undertaken by the Honours student John Uhlmann, 07/01

A cluster is a distributed memory parallel computer made up of commodity components. Compared with traditional parallel computers, it can provide the flexibility of running a mix of several parallel jobs (over possibly overlapping subsets of processors), plus serial jobs. This is done in a time-sliced manner.

This flexibility has recently attracted the wide support of many mainstream vendors for the cluster model. However, this flexibility requires efficient and sophisticated scheduling be implemented over the cluster. The basic requirement is called gang scheduling, where processes for a parallel task must be allocated a time slice simultaneously across all processors it requires.

While efficient algorithms for doing this have been proposed and evaluated using simulations, implementing such a scheduling algorithm on an actual cluster introduces new problems, such as the timely gathering and distribution of information about tasks required to make scheduling decisions. This project aims to design and implement a user-level parallel job submission and execution system, which can implement and be used to evaluate various scheduling algorithms. The prun systems for the Fujitsu AP+ is such a run-time system, although the scheduling algorithm in this case is built into the Linux kernel.

The primary target platform will be the ANU Beowulf cluster, however, implementation can also be performed on the CAP Project's Fujitsu AP3000 . Workloads representing patterns of realistic cluster usage will need to be generated, possibly from monitoring usage patterns of the ANU Beowulf, in order to perform this evaluation.

This project could be extended by implementing the scheduling algorithm in the Linux scheduler on the ANU Beowulf cluster (the Fujitsu AP3000 could also be used for development, as it can be booted under Linux).

The topic is part of the ANU-Fujitsu CAP Project, and some support funding may be available.

References