|
Reordering of sparse matrices is essential for good performance on Cholesky factorizations. A good reordering algorithm not only is crucial on a sequential computer, but also can lead to a much better load balance on a parallel computer, and thus to a dramatic increase in performance compared to a "naive" ordering. Though low fill has been long served as an appropriate criterion for measuring the quality of sequential ordering methods, either in space or time, there is not yet an agreed upon objective function for the parallel ordering problem. Reviewing the previous works on parallel ordering problem, most have used elimination tree height as the criterion for comparing orderings; that is, with shorter trees assumed to be superior to taller trees. But with more complicated the parallel architectures have been evolving into and quite a variety of parallel factorization algorithms have been developed, it seems hardly to achieve a well agreed upon criterion suitable for all situations. The discrepancy of simply using elimination tree height as the criterion for different situations is comprehensively illustrated with some contrived examples. As the results shown, the height of elimination tree not only inept to indicate the actual parallel computing time, but also neglect the underlying communication overhead. Additionally, the speedup it can achieve is far less than the expected. The underlying idea in this dissertation is that we propose a framework of reordering criteria aiming to minimize the completion time of various parallel sparse Cholesky factorizations with respect to some architecture issues. Under the framework we have presented a general reordering scheme which yields an equivalent reordering that always guarantees minimum completion time among all equivalent reorderings of the filled graph. Empirical results on a careful selection from the Harwell-Boeing sparse matrices collection further assent the attraction of this reordering scheme.
|