Parallel Dense Linear Algebra Libraries: Useability and Performance Issues

Peter Strazdins, Parallel Dense Linear Algebra Libraries: Useability and Performance Issues,
presented as part of the CTAC'99 High Performance Computiing Workshop , Canberra, September 1999

Abstract

This talk will give an overview of existing such libraries for distributed memory parallel computers, namely ScaLAPACK , PLAPACK and DLAPACK (an experimental library based on the DBLAS parallel BLAS library). It will be oriented towards users wishing to write parallel applications using such libraries.

Firstly, the issue of matrix distribution over the processor grid must be considered; the distributions used are the block-cyclic matrix distribution (ScaLAPACK, DLAPACK) or the physically-based matrix distribution (PLAPACK). This issue is problematic for two reasons: the very complexity of these distributions produces a significant burden on the application writer, and secondly the optimum distribution may change throughout stages of the application, possibly requiring explicit redistribution of data. Furthermore, it affects the I/O of the matrix data to and from disk. Thus, the distribution used and the attendant effort required to set up matrix descriptor objects has a large impact on the ease of use of such libraries.

Secondly, the issue of blocking methods used by the algorithms in these libraries needs to be considered if good parallel performance is required. An overview of the methods used by each library will be given, together with the performance implications on contemporary parallel machines.

These issues will be illustrated by the AccuField application for simulating the electromagnetic field emissions of computational devices, recently developed by Fujitsu Laboratories in conjunction with the author. It uses both ScaLAPACK and DLAPACK routines.