Compared with traditional methods which minimize communication costs at the expense of load balance, it has a small (in some situations negative) increase in communication volume costs. It however incurs extra communication startup costs, but only by a factor not exceeding 2. To maximize load balance and minimize the cost of panel re-distribution, storage block sizes should be kept small; furthermore, in many situations of interest, there will be no significant communication startup penalty for doing so.
Results will be given on the Fujitsu AP+ parallel computer, which will compare the performance of panel scattering with previously established methods, for LU, LLT and QR factorization. These are consistent with a detailed performance model for LU factorization for each method that is developed here.