The ACCUFIELD software consists of a preprocessor, a solver and a postprocessor. The preprocessor is used to input or modify a model of the system to be analysed. It converts the model to a dense complex symmetric indefinite linear system, which can then be solved for various EM frequencies by the solver. These systems are often `weakly indefinite', that is, most diagonal elements are considerably larger than the off-diagonal elements in the same column. The postprocessor displays the resultant electromagnetic fields. ACCUFIELD also has a highly developed user interface, enabling many convenient model editing functions and graphical displays of the resulting analysis.
The size of the linear system N depends on the model to be analysed. For a Note PC, N ~ 5000, but for more complex systems, N ~ 30,000 is required. The memory and computational requirements of such a large system require parallel processing, for example on a 24 node Fujitsu AP3000 distributed memory multicomputer, with each node being a 300 MHz UltraSPARC II with 1.5 GB of memory.
This paper describes the design, implementation, performance and validation of the ACCUFIELD application. The main computational challenges lie in the solver stage, where an O(N^3) computation is required for the direct solution of the linear system about a central frequency omega_c. Some of this cost is amortized using a frequency stepping method , where the system can be solved for nearby frequencies omega by O(N^2) iterative methods, using the solution at omega_c as a preconditioner. This reduced parallel solution time by a factor of 2 for moderate-sized matrices, with much larger improvements expected for large matrices.
The performance of the solver stage is of key importance. For ACCUFIELD, the first known parallelization of a dense direct symmetric indefinite solver was made, requiring many issues to be solved in order to minimize communication costs. The algorithms used are variants of the Bunch-Kaufman diagonal pivoting method, such as those used in the LAPACK routine zsysv(), which is one of the fastest publically available of such routines. However, our corresponding routine, called pzsysv(), out-performs zsysv() by ~ 15% for large matrices on the UltraSPARC family of processors. Further performance gains were achieved from developing a new variant of the Bunch-Kaufman method that reduces symmetric interchanges while maintaining the same growth bound. This algorithm's efficiency is demonstrated by its parallel speedup of 12--13 for large matrices on a 16 node AP3000.
To give ACCUFIELD high performance on the UltraSPARC family of processors, a heavily optimized implementation of the complex precision UltraSPARC BLAS was developed. A combination of various techniques are described in the paper that improved the solver's performance by over 40% over the commercially available Sun Performance Library 1.2 BLAS.