Firstly, the issue of matrix distribution over the processor grid must be considered; the distributions used are the block-cyclic matrix distribution (ScaLAPACK, DLAPACK) or the physically-based matrix distribution (PLAPACK). This issue is problematic for two reasons: the very complexity of these distributions produces a significant burden on the application writer, and secondly the optimum distribution may change throughout stages of the application, possibly requiring explicit redistribution of data. Furthermore, it affects the I/O of the matrix data to and from disk. Thus, the distribution used and the attendant effort required to set up matrix descriptor objects has a large impact on the ease of use of such libraries.
Secondly, the issue of blocking methods used by the algorithms in these libraries needs to be considered if good parallel performance is required. An overview of the methods used by each library will be given, together with the performance implications on contemporary parallel machines.
These issues will be illustrated by the AccuField application for simulating the electromagnetic field emissions of computational devices, recently developed by Fujitsu Laboratories in conjunction with the author. It uses both ScaLAPACK and DLAPACK routines.