Parallelizing Dense Symmetric Indefinite Solvers
P.E. Strazdins.
Parallelizing Dense Symmetric Indefinite Solvers,
PART'99, The 6th Annual Australasian Conference on
Parallel And Real-Time Systems, Springer-Verlag, Melbourne, Dec 1999,
pages 398-410.
Contents
Abstract
This paper describes the design, implementation and performance
of a parallel direct dense symmetric-indefinite solver. The primary
target architecture for the solver is the Fujitsu AP3000, a distributed
memory machine based on the UltraSPARC processor.
The solver uses the Bunch-Kaufman diagonal pivoting method and is
based on the LAPACK algorithm, with several modifications required for
efficient parallel implementation. The solver out-performs its equivalent
LAPACK routine zsysv() by 13% when run on a 300 MHz UltraSPARC
processor for a complex matrix of order 1601 and a single right hand size,
with a speed of 436 (double precision) MFLOPs.
Using run-time settable parameters, the routine can use any logical P x Q
processor grid, any (square) storage block size r and any algorithmic
block size w. This enables performance tuning via trading off load balance
and communication penalties, the latter being relatively higher than for
LU or LLT solvers. For a matrix of order 10000 on a 16-node AP3000,
best performance was achieved with P = Q = 4, w = 48; r = 4 with a
speed of 5.4 GFLOPs and a parallel speedup of 12.4
Keywords
dense linear algebra, block cyclic decomposition,
algorithmic blocking, storage blocking,
symmetric indefinite systems, LDLT decomposition