Hostname: page-component-7c8c6479df-94d59 Total loading time: 0 Render date: 2024-03-27T01:02:30.231Z Has data issue: false hasContentIssue false

Parallel numerical linear algebra

Published online by Cambridge University Press:  07 November 2008

James W. Demmel
Affiliation:
Computer Science Division and Mathematics DepartmentUniversity of California at BerkeleyBerkeley, CA 94720USA E-mail: demmel@cs.berkeley.edu
Michael T. Heath
Affiliation:
Department of Computer Science and National Center for Supercomputing ApplicationsUniversity of IllinoisUrbana, IL 61801USA E-mail: heath@ncsa.uiuc.edu
Henk A. van der Vorst
Affiliation:
Mathematical InstituteUtrecht UniversityUtrecht, The Netherlands E-mail: vorst@math.ruu.nl

Abstract

We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of paralled processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.

Type
Research Article
Copyright
Copyright © Cambridge University Press 1993

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Aho, A., Hopcroft, J. and Ullman, J. (1974), The Design and Analysis of Computer Algorithms, Addison-Wesley (New York).Google Scholar
Anderson, E. and Dongarra, J. (1990), ‘Evaluating block algorithm variants in LA-PACK’, Computer Science Dept. Technical Report CS–90–103, University of Tennessee, Knoxville, TN (LAPACK Working Note 19).Google Scholar
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J. Du, Green-baum, A., Hammarling, S., McKenney, A., Ostrouchov, S. and Sorensen, D. (1992), LAPACK Users’ Guide, Release 1.0, SIAM (Philadelphia).CrossRefGoogle Scholar
ANSI/IEEE, New York, (1985), IEEE Standard for Binary Floating Point Arithmetic, Std 754–1985 edition (New York).Google Scholar
Arnoldi, W. E. (1951), ‘The principle of minimized iteration in the solution of the matrix eigenproblem’, Quart. Appl. Math. 9, 1729.CrossRefGoogle Scholar
Ashcraft, C., Eisenstat, S.C. and Liu, J.W.-H. (1990), ‘A fan-in algorithm for distributed sparse numerical factorization’, SIAM J. Sci. Statist. Comput. 11, 593599.CrossRefGoogle Scholar
Ashcraft, C., Eisenstat, S.C., Liu, J.W.-H. and Sherman, A.H. (1990), ‘A comparison of three column-based distributed sparse factorization schemes’, Technical Report YALEU/DCS/RR-810, Dept of Computer Science, Yale University, New Haven, CT.CrossRefGoogle Scholar
Ashcraft, C. and Grimes, R. (1988), ‘On vectorizing incomplete factorizations and SSOR preconditioners’, SIAM J. Sci. Statist. Comput. 9, 122151.CrossRefGoogle Scholar
Ashcraft, C., Grimes, R., Lewis, J., Peyton, B. and Simon, H. (1987), ‘Progress in sparse matrix methods for large linear systems on vector supercomputers’, Int. J. Supercomput. Appl. 1(4), 1030.Google Scholar
Auslander, L. and Tsao, A. (1992), ‘On parallelizable eigensolvers’, Adv. Appl. Math. 13, 253261.CrossRefGoogle Scholar
Babus˘ka, I. (1972), ‘Numerical stability in problems of linear algebra’, SIAM J. Numer. Anal. 9, 5377.Google Scholar
Bai, Z. and Demmel, J. (1989), ‘On a block implementation of Hessenberg multi-shift QR iteration’, Int. J. High Speed Comput. 1(1), 97112 (also LAPACK Working Note 8).CrossRefGoogle Scholar
Bai, Z. and Demmel, J. (1992), ‘Design of a parallel nonsymmetric eigenroutine toolbox’, Computer Science Dept preprint, University of California, Berkeley, CA.Google Scholar
Bai, Z., Hu, D. and Reichel, L. (1991), ‘A Newton basis GMRES implementation’, Technical Report 91–03, University of Kentucky.Google Scholar
Bailey, D. H., Lee, K. and Simon, H. D. (1991), ‘Using Strassen's algorithm to accelerate the solution of linear systems’, J. Supercomput. 4, 97371.CrossRefGoogle Scholar
Barlow, J. (1991), ‘Error analysis of update methods for the symmetric eigenvalue problem’, to appear in SIAM J. Math. Anal. Appl. (Tech Report CS-91–21, Computer Science Department, Penn State University.)Google Scholar
Beattie, C. and Fox, D. (1989), ‘Localization criteria and containment for Rayleigh quotient iteration’, SIAM J. Math. Anal. Appl. 10 (1), 8093.CrossRefGoogle Scholar
Bell, K., Hatlestad, B., Hansteen, O. and Araldsen, P. (1973), NORSAM—A programming system for the finite element method, User's Manual, Part I, General description. Institute for Structural Analysis, NTH, N-7034 Trondheim, Norway.Google Scholar
Benner, R., Montry, G. and Weigand, G. (1987), ‘Concurrent multifrontal methods: shared memory, cache, and frontwidth issues’, Int. J. Supercomput. Appl. 1(3), 2644.Google Scholar
Berryman, H., Saltz, J., Gropp, W. and Mirchandaney, R. (1989), ‘Krylov methods preconditioned with incompletely factored matrices on the CM-2’, Technical Report 89–54, NASA Langley Research Center, ICASE, Hampton, VA.CrossRefGoogle Scholar
Bertsekas, D. and Tsitsiklis, J. (1989), Parallel and Distributed Comptutation: Numerical Methods, Prentice Hall (New York).Google Scholar
Bischof, C. (1989), ‘Computing the singular value decomposition on a distributed system of vector processors’, Parallel Comput. 11 171186.CrossRefGoogle Scholar
Bischof, C. and Sun, X. (1992), ‘A divide and conquer method for tridiagonalizing symmetric matrices with repeated eigenvalues’, MCS Report P286–0192, Argonne National Lab.Google Scholar
Bischof, C. and Tang, P. (1991a), ‘Generalized incremental condition estimation’, Computer Science Dept, Technical Report CS-91–132, University of Tennessee, Knoxville, TN (LAPACK Working Note 32).Google Scholar
Bischof, C. and Tang, P. (1991b), ‘Robust incremental condition estimation’, Computer Science Dept. Technical Report CS-91–133, University of Tennessee, Knoxville, TN (LAPACK Working Note 33).CrossRefGoogle Scholar
Brent, R. P. (1973), Algorithms for Minimization Without Derivatives, Prentice-Hall (New York).Google Scholar
Brent, R. and Luk, F. (1985), ‘The solution of singular value and symmetric eigenvalue problems on multiprocessor arrays’, SIAM J. Sci. Statist. Comput. 6, 6984.CrossRefGoogle Scholar
Brunet, J., Edelman, A. and Mesirov, J. (1990), ‘An optimal hypercube direct n-body solver on the connection machine’, in: Proceedings of Supercomputing ‘90, IEEE Computer Society Press (New York), 748752.CrossRefGoogle Scholar
Calvetti, D., Petersen, J. and Reichel, L. (1991), ‘A parallel implementation of the GMRES method’, Technical Report ICM-9110–6, Institute for Computational Mathematics, Kent, OH.Google Scholar
Cannon, L. (1969), ‘A cellular computer to implement the Kalman filter algorithm’, PhD thesis, Montana State University, Bozeman, MN.Google Scholar
Chen, S. C., Kuck, D. J. and Sameh, A. H. (1978), ‘Practical parallel band triangular system solvers’, ACM Trans. Math. Software 4, 270277.CrossRefGoogle Scholar
Chronopoulos, A. T. (1991), ‘Towards efficient parallel implementation of the CG method applied to a class of block tridiagonal linear systems’, in: Supercomputing '91, IEEE Computer Society Press (Los Alamitos, CA), 578587.CrossRefGoogle Scholar
Chronopoulos, A. T. and Gear, C. W. (1989), ‘s-step iterative methods for symmetric linear systems’, J. Comput. Appl. Math. 25, 153168.CrossRefGoogle Scholar
Chronopoulos, A. T. and Kim, S. K. (1990), ‘s-Step Orthomin and GMRES implemented on parallel computers’, Technical Report 90/43R, UMSI, Minneapolis.Google Scholar
Chu, E. (1988a), ‘Orthogonal decomposition of dense and sparse matrices on multiprocessors’, PhD thesis, University of Waterloo.Google Scholar
Chu, M. (1988b), ‘A note on the homotopy method for linear algebraic eigenvalue problems’, Lin. Alg. Appl. 105, 225236.CrossRefGoogle Scholar
Chu, M., Li, T.-Y. and Sauer, T. (1988), ‘Homotopy method for general λ-matrix problems’, SIAM J. Math. Anal. Appl. 9 (4), 528536.CrossRefGoogle Scholar
Csanky, L. (1977), ‘Fast parallel matrix inversion algorithms’, SIAM J. Comput. 5 618623.CrossRefGoogle Scholar
Cuppen, J.J.M. (1981), ‘A divide and conquer method for the symmetric tridiagonal eigenproblem’, Numer. Math. 36, 177195.CrossRefGoogle Scholar
Davis, G., Funderlic, R. and Geist, G. (1987), ‘A hypercube implementation of the implicit double shift QR algorithm’, in: Hypercube Multiprocessors 1987, SIAM (Philadelphia), 619626.Google Scholar
D'Azevedo, E. F. and Romine, C. H. (1992), ‘Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors’, Technical Report ORNL/TM-12192, Oak Ridge National Laboratory.CrossRefGoogle Scholar
de Groen, P. P. N. (1991), ‘Base p-cyclic reduction for tridiagonal systems of equations’, Appl. Numer. Math. 8 117126.CrossRefGoogle Scholar
de Sturler, E. (1991), ‘A parallel restructured version of GMRES(m)’, Technical Report 91–85, Delft University of Technology, Delft.Google Scholar
Deichmoller, A. (1991), ‘Über die Berechnung verallgemeinerter singulärer Werte mittles Jacobi-ähnlicher Verfahren’, PhD thesis, Fernuniversität—Hagen, Hagen, Germany.Google Scholar
Dekel, E., Nassimi, D. and Sahni, S. (1981), ‘Parallel matrix and graph algorithms’, SIAM J. Comput. 10 (4), 657675.CrossRefGoogle Scholar
Dekker, T. (1971), ‘A floating point technique for extending the available precision’, Numer. Math. 18 224242.CrossRefGoogle Scholar
Demmel, J. (1987), ‘Three methods for refining estimates of invariant subspaces’, Computing 38, 4357.CrossRefGoogle Scholar
Demmel, J. (1992a), ‘Specifications for robust parallel prefix operations’, Technical report, Thinking Machines Corp., Cambridge, MA.Google Scholar
Demmel, J. (1992b), ‘Trading off parallelism and numerical stability’, Computer Science Division Technical Report UCB//CSD-92–702, University of California, Berkeley, CA.Google Scholar
Demmel, J. and Higham, N. J. (1992), ‘Stability of block algorithms with fast Level 3 BLAS’, ACM Trans. Math. Soft. 18 (3), 274291.CrossRefGoogle Scholar
Demmel, J. and Kahan, W. (1990), ‘Accurate singular values of bidiagonal matrices’, SIAM J. Sci. Statist. Comput. 11 (5), 873912.CrossRefGoogle Scholar
Demmel, J. and Veselić, K. (1992), ‘Jacobi's method is more accurate than QR’, SIAM J. Mat. Anal Appl. 13 (4), 12041246.CrossRefGoogle Scholar
Doi, S. (1991), ‘On parallelism and convergence of incomplete LU factorizations’, Appl. Numer. Math. 7, 417436.CrossRefGoogle Scholar
Dongarra, J. and Grosse, E. (1987), ‘Distribution of mathematical software via electronic mail’, Commun. ACM 30 (5), 403407.CrossRefGoogle Scholar
Dongarra, J. and Ostrouchov, S. (1990), ‘LAPACK block factorization algorithms on the Intel iPSC/860’, Computer Science Dept Technical Report CS-90–115, University of Tennessee, Knoxville, TN (LAPACK Working Note 24).Google Scholar
Dongarra, J. and Sidani, M. (1991), ‘A parallel algorithm for the non-symmetric eigenvalue problem’, Computer Science Dept Technical Report CS-91–137, University of Tennessee, Knoxville, TN.CrossRefGoogle Scholar
Dongarra, J. and Sorensen, D. (1987), ‘A fully parallel algorithm for the symmetric eigenproblem’, SIAM J. Sci. Statist. Comput. 8 (2), 139154.CrossRefGoogle Scholar
Dongarra, J. and van de Geijn, R. (1991), ‘Reduction to condensed form for the eigenvalue problem on distributed memory computers’, Computer Science Dept Technical Report CS-91–130, University of Tennessee, Knoxville, TN (LAPACK Working Note 30), to appear in Parallel Comput.CrossRefGoogle Scholar
Dongarra, J. and van de Geijn, R. (1991), ‘Two dimensional basic linear algebra communication subprograms’, Computer Science Dept Technical Report CS-91–138, University of Tennessee, Knoxville, TN (LAPACK Working Note 37).Google Scholar
Dongarra, J., Bunch, J., Moler, C. and Stewart, G. W. (1979), LINPACK User's Guide, SIAM (Philadelphia).CrossRefGoogle Scholar
Dongarra, J., Du Croz, J., Duff, I. and Hammarling, S. (1990), ‘A set of Level 3 Basic Linear Algebra Subprograms’, ACM Trans. Math. Soft. 16 (1), 117.CrossRefGoogle Scholar
Dongarra, J., Du Croz, J., Hammarling, S. and Hanson, Richard J. (1988), ‘An extended set of fortran basic linear algebra subroutines’, ACM Trans. Math. Soft. 14 (1), 117.CrossRefGoogle Scholar
Dongarra, J., Duff, I., Sorensen, D. and van der Vorst, H. (1991), Solving Linear Systems on Vector and Shared Memory Computers, SIAM (Philadelphia).Google Scholar
Dongarra, J., Geist, G. A., and Romine, C. (1990b), ‘Computing the eigenvalues and eigenvectors of a general matrix by reduction to tridiagonal form’, Technical Report ORNL/TM-11669, Oak Ridge National Laboratory (to appear in ACM TOMS).CrossRefGoogle Scholar
Dongarra, J., Hammarling, S. and Sorensen, D. (1989), ‘Block reduction of matrices to condensed forms for eigenvalue computations’, J. Comput. Appl. Math. 27, 215227 (LAPACK Working Note 2).CrossRefGoogle Scholar
Du Croz, J., Mayes, P. J. D. and Brozolo, G. Radicati di (1990), ‘Factorizations of band matrices using Level 3 BLAS’, Computer Science Dept Technical Report CS-90–109, University of Tennessee, Knoxville, TN (LAPACK Working Note 21).CrossRefGoogle Scholar
Dubois, P. and Rodrigue, G. (1977), ‘An analysis of the recursive doubling algorithm’, High Speed Computer and Algorithm Organization, (Kuck, D. J. and Sameh, A. H., eds), Academic Press (New York).Google Scholar
Dubrulle, A. (1991), ‘The multishift QR algorithm: is it worth the trouble?’, Palo Alto Scientific Center Report G320–3558x, IBM Corp., 1530 Page Mill Road, Palo Alto, CA 94304.Google Scholar
Duff, I.S. (1986), ‘Parallel implementation of multifrontal schemes’, Parallel Comput. 2, 193204.CrossRefGoogle Scholar
Duff, I.S. and Meurant, G.A. (1989), ‘The effect of ordering on preconditioned conjugate gradient’, BIT 29, 635657.CrossRefGoogle Scholar
Duff, I.S., Erisman, A.M. and Reid, J.K. (1986), Direct Methods for Sparse Matrices, Oxford University Press (Oxford, UK).Google Scholar
Eberlein, P. (1962), ‘A Jacobi method for the automatic computation of eigenvalues and eigenvectors of an arbitrary matrix’, J. SIAM 10, 7488.Google Scholar
Eberlein, P. (1987), ‘On the Schur decomposition of a matrix for parallel computation’, IEEE Trans. Comput. 36, 167174.CrossRefGoogle Scholar
Eijkhout, V. (1992), ‘Qualitative properties of the conjugate gradient and Lanczos methods in a matrix framework’, Computer Science Dept. Technical Report CS-92–170, University of Tennessee, Knoxville, TN (LAPACK Working Note 51).Google Scholar
Eisenstat, S., Heath, M., Henkel, C. and Romine, C. (1988), ‘Modified cyclic algorithms for solving triangular systems on distributed memory multiprocessors’, SIAM J. Sci. Statist. Comput. 9, 589600.CrossRefGoogle Scholar
Fox, G., Hiranandani, S., Kennedy, K., Koelbel, C., Kremer, U., Tseng, C.-W. and Wu, M.-Y. (1990), ‘Fortran D language specification’, Computer Science Department Report CRPC-TR90079, Rice University, Houston, TX.Google Scholar
Fox, G., Johnson, M., Lyzenga, G., Otto, S., Salmon, J. and Walker, D. (1988), Solving Problems on Concurrent Processors, vol. I, Prentice Hall (New York).Google Scholar
Freund, R. W., Golub, G. H. and Nachtigal, N. M. (1992), ‘Iterative solution of linear systems’, Acda Numerica 1, 57100.CrossRefGoogle Scholar
Gallivan, K., Heath, M., Ng, E., Ortega, J., Peyton, B., Plemmons, R., Romine, C., Sameh, A. and Voigt, R. (1990), Parallel Algorithms for Matrix Computations, SIAM (Philadelphia).Google Scholar
Gallivan, K., Jalby, W., Meier, U. and Sameh, A. (1988), ‘Impact of hierarchical memory systems on linear algebra algorithm design’, Int. J. Supercomput. Appl. 2, 1248.Google Scholar
Gallivan, K. A., Heath, M. T., Ng, E., et al. (1990), Parallel Algorithms for Matrix Computations, SIAM (Philadelphia).Google Scholar
Gallivan, K. A., Plemmons, R. J. and Sameh, A. H. (1990), ‘Parallel algorithms for dense linear algebra computations’, SIAM Rev. 32, 54135.CrossRefGoogle Scholar
Garbow, B. S., Boyle, J. M., Dongarra, J. J. and Moler, C. B. (1977), Matrix eigensystem routines – EISPACK guide extension, Lecture Notes in Computer Science, vol. 51, Springer-Verlag (Berlin).Google Scholar
Geist, G. A. (1990), ‘Parallel tridiagonalization of a general matrix using distributed memory multiprocessors’, in Proc. Fourth SIAM Conf. on Parallel Processing for Scientific Computing, SIAM (Philadelphia), 2935.Google Scholar
Geist, G. A. (1991), ‘Reduction of a general matrix to tridiagonal form’, SIAM J. Math. Anal. Appl. 12(2), 362373.CrossRefGoogle Scholar
Geist, G. A. and Davis, G. J. (1990), ‘Finding eigenvalues and eigenvectors of unsymmetric matrices using a distributed memory multiprocessor’, Parallel Comput. 13(2), 199209.CrossRefGoogle Scholar
Geist, G. A., Lu, A. and Wachspress, E. (1989), ‘Stabilized reduction of an arbitrary matrix to tridiagonal form’, Technical Report ORNL/TM-11089, Oak Ridge National Laboratory.CrossRefGoogle Scholar
Gentleman, M. and Kung, H. T. (1981), ‘Matrix triangularization by systolic arrays’, in Proc. SPIE 298, Real Time Signal Processing, San Diego, CA, 1926.Google Scholar
George, J.A. (1973), ‘Nested dissection of a regular finite element mesh’, SIAM J. Numer. Anal. 10, 345363.CrossRefGoogle Scholar
George, J.A. and Liu, J.W.-H. (1981), Computer Solution of Large Sparse Positive Definite Systems, Prentice-Hall (Englewood Cliffs, NJ).Google Scholar
George, J.A. and Liu, J.W.-H. (1989), ‘The evolution of the minimum degree ordering algorithm’, SIAM Rev. 31, 119.CrossRefGoogle Scholar
George, J.A., Heath, M.T., Liu, J.W.-H. and Ng, E. (1986), ‘Solution of sparse positive definite systems on a shared memory multiprocessor’, Int. J. Parallel Progr. 15, 309325.CrossRefGoogle Scholar
George, J.A., Heath, M.T., Liu, J.W.-H. and Ng, E. (1988), ‘Sparse Cholesky factorization on a local-memory multiprocessor’, SIAM J. Sci. Statist. Comput. 9, 327340.CrossRefGoogle Scholar
George, J.A., Heath, M.T., Liu, J.W.-H. and Ng, E. (1989), ‘Solution of sparse positive definite systems on a hypercube’, J. Comput. Appl. Math. 27, 129156.CrossRefGoogle Scholar
George, J.A., Liu, J.W.-H. and Ng, E. (1989), ‘Communication results for parallel sparse Cholesky factorization on a hypercube’, Parallel Comput. 10, 287298.CrossRefGoogle Scholar
Gilbert, J. and Schreiber, R. (1992), ‘Highly parallel sparse cholesky factorization’, SIAM J. Sci. Statist. Comput. 13, 11511172.CrossRefGoogle Scholar
Golub, G. and Van Loan, C. (1989), Matrix Computations, 2nd ed, Johns Hopkins University Press (Baltimore, MD).Google Scholar
Gottlieb, A. and Almasi, G. (1989), Highly Parallel Computing, Benjamin Cummings (Redwood City, CA).Google Scholar
Gu, M. and Eisenstat, S. (1992), ‘A stable and efficient algorithm for the rank-1 modification of the symmetric eigenproblem’, Computer Science Dept Report YALEU/DCS/RR-916, Yale University.Google Scholar
Hari, V. and Veselić, K. (1987), ‘On Jacobi methods for singular value decompositions’, SIAM J. Sci. Statist. Comput. 8, 741754.CrossRefGoogle Scholar
Heath, M.T. and Romine, C. (1988), ‘Parallel solution of triangular systems on distributed memory multiprocessors’, SIAM J. Sci. Statist. Comput. 9, 558588.CrossRefGoogle Scholar
Heath, M.T., Ng, E. and Peyton, B. (1991), ‘Parallel algorithms for sparse linear systems’, SIAM Rev. 33, 420460.CrossRefGoogle Scholar
Heller, D. (1978), ‘Some aspects of the cyclic reduction algorithm for block tridiagonal linear systems’, SIAM J. Numer. Anal. 13, 484496.CrossRefGoogle Scholar
Hestenes, M. R. and Stiefel, E. (1954), ‘Methods of conjugate gradients for solving linear systems’, J. Res. Natl Bur. Stand. 49, 409436.CrossRefGoogle Scholar
High Peformance Fortran (1991), documentation available via anonymous ftp from titan.cs.rice.edu in directory public/HPFF.Google Scholar
Higham, N. J. (1990), ‘Exploiting fast matrix multiplication within the Level 3 BLAS’, ACM Trans. Math. Soft. 16, 352368.CrossRefGoogle Scholar
Ho, C. T. (1990), Optimal communication primitives and graph embeddings on hypercubes, PhD thesis, Yale University.Google Scholar
Ho, C. T., Johnsson, S. L. and Edelman, A. (1991), ‘Matrix multiplication on hypercubes using full bandwidth and constant storage’, in The Sixth Distributed Memory Computing Conf. Proc., IEEE Computer Society Press (New York),447451.Google Scholar
Hong, X. and Kung, H. T. (1981), ‘I/O complexity: the red blue pebble game’, in Proc. 13th Symposium on the Theory of Computing, ACM (New York), 326334.Google Scholar
Howland, J. (1983), ‘The sign matrix and the separation of matrix eigenvalues’, Lin. Alg. Appl. 49, 221232.CrossRefGoogle Scholar
Ipsen, I. and Jessup, E. (1990), ‘Solving the symmetric tridiagonal eigenvalue problem on the hypercube’, SIAM J. Sci. Statist. Comput. 11(2), 203230.CrossRefGoogle Scholar
Irons, B.M. (1970), ‘A frontal solution program for finite element analysis’, Int. J. Numer. Math. Engrg 2, 532.CrossRefGoogle Scholar
Jess, J.A.G. and Kees, H.G.M. (1982), ‘A data structure for parallel L/U decomposition’, IEEE Trans. Comput. C-31, 231239.CrossRefGoogle Scholar
Jessup, E. (1991), ‘A case against a divide and conquer approach to the nonsymmetric eigenproblem’, Technical Report ORNL/TM-11903, Oak Ridge National Laboratory.CrossRefGoogle Scholar
Jessup, E. and Ipsen, I. (1992), ‘Improving the accuracy of inverse iteration’, SIAM J. Sci. Statist. Comput. 13(2), 550572.CrossRefGoogle Scholar
Jessup, E. and Sorensen, D. (1989), ‘A divide and conquer algorithm for computing the singular value decomposition of a matrix’, in Proc. Third SIAM Conf. on Parallel Processing for Scientific Computing SIAM (Philadelphia), 6166. SIAM.Google Scholar
Johnsson, S. L. (1987), ‘Communication efficient basic linear algebra computations on hypercube architecture’, J. Parallel Distr. Comput. 4, 133172.CrossRefGoogle Scholar
Johnsson, S. L. (1990), private communication.Google Scholar
Johnsson, S. L. and Ho, C. T. (1989), ‘Matrix multiplication on Boolean cubes using generic communication primitives’, in Parallel Processing and Medium Scale Multiprocessors, (Wouk, A., ed.), SIAM (Philadelphia), 108156.Google Scholar
Kahan, W. (1968), ‘Accurate eigenvalues of a symmetric tridiagonal matrix’, Computer Science Dept Technical Report CS41, Stanford University, Stanford, CA, July 1966 (revised June 1968).Google Scholar
Karp, R. and Ramachandran, V. (1990), ‘Parallel algorithms for shared memory machines’, in Handbook of Theoretical Computer Science, vol. A: Algorithms and Complexity, (van Leeuwen, J., ed.), Elsevier and MIT Press (New York), 869941.Google Scholar
Kenney, C. and Laub, A. (1991), ‘Rational iteration methods for the matrix sign function’, SIAM J. Math. Anal. Appl. 21, 487494.Google Scholar
Krishnakumar, A. S. and Morf, (1986), ‘Eigenvalues of a symmetric tridiagonal matrix: a divide and conquer approach’, Numer. Math. 48, 349368.CrossRefGoogle Scholar
Kuck, D. and Sameh, A. (1977), ‘A parallel QR algorithm for symmetric tridiagonal matrices’, IEEE Trans. Comput. C-26(2).Google Scholar
Kung, H. T. (1974), ‘New algorithms and lower bounds for the parallel evaluation of certain rational expressions’, Technical report, Carnegie Mellon University.CrossRefGoogle Scholar
Lambiotte, J. J. and Voigt, R. G. (1974), ‘The solution of tridiagonal linear systems on the CDC-STAR-100 computer’, Technical report, ICASE-NASA Langley Research Center, Hampton, VA.CrossRefGoogle Scholar
Lang, B. (1992), ‘Reducing symmetric band matrices to tridiagonal form – a comparison of a new parallel algorithm with two serial algorithms on the iPSC/860’, Institut für angewandte mathematik report, Universität Karlsruhe.CrossRefGoogle Scholar
Lawson, C., Hanson, R., Kincaid, D. and Krogh, F. (1979), ‘Basic linear algebra subprograms for fortran usage’, ACM Trans. Math. Soft. 5, 308323.CrossRefGoogle Scholar
Lederman, S., Tsao, A. and Turnbull, T. (1992), ‘A parallelizable eigensolver for real diagonalizable matrices with real eigenvalues’, Report TR-01–042, Supercomputing Research Center, Bowie, MD.Google Scholar
Lewis, J.G., Peyton, B.W. and Pothen, A. (1989), ‘A fast algorithm for reordering sparse matrices for parallel factorization’, SIAM J. Sci. Statist. Comput. 10, 11561173.CrossRefGoogle Scholar
Li, G. and Coleman, T. (1988), ‘A parallel triangular solver on a distributed memory multiprocessor’, SIAM J. Sci. Statist. Comput. 9, 485502.CrossRefGoogle Scholar
Li, R. (1992), ‘Solving the secular equation stably’, UC Berkeley Math. Dept Report, in preparation.CrossRefGoogle Scholar
Li, T.-Y. and Zeng, Z. (1992), ‘Homotopy-determinant algorithm for solving nonsymmetric eigenvalue problems’, Math. Comput. to appear.CrossRefGoogle Scholar
Li, T.-Y., Zeng, Z. and Cong, L. (1992), ‘Solving eigenvalue problems of nonsymmetric matrices with real homotopies’, SIAM J. Numer. Anal. 29(1), 229248.CrossRefGoogle Scholar
Li, T.-Y., Zhang, H. and Sun, X.-H. (1991), ‘Parallel homotopy algorithm for symmetric tridiagonal eigenvalue problem’, SIAM J. Sci. Statist. Comput. 12(3), 469487.CrossRefGoogle Scholar
Lin, C-C. and Zmijewski, E. (1991), ‘A parallel algorithm for computing the eigenvalues of an unsymmetric matrix on an SIMD mesh of processors’, Department of Computer Science TRCS 91–15, University of California, Santa Barbara, CA.Google Scholar
Liu, J.W.-H. (1986), ‘Computational models and task scheduling for parallel sparse Cholesky factorization’, Parallel Comput. 2, 327342.CrossRefGoogle Scholar
Liu, J.W.-H. (1989), ‘Reordering sparse matrices for parallel elimination’, Parallel Comput. 11, 7391.CrossRefGoogle Scholar
Liu, J.W.-H. (1990), ‘The role of elimination trees in sparse factorization’, SIAM J. Matrix Anal. Appl. 11, 134172.CrossRefGoogle Scholar
Lo, S.-S., Phillipe, B. and Sameh, A. (1987), ‘A multiprocessor algorithm for the symmetric eigenproblem’, SIAM J. Sci. Statist. Comput. 8(2), 155165.CrossRefGoogle Scholar
Lucas, R., Blank, W. and Tieman, J. (1987), ‘A parallel solution method for large sparse systems of equations’, IEEE Trans. Comput. Aided Des. CAD-6, 981991.CrossRefGoogle Scholar
Luk, F. and Park, H. (1989), ‘On parallel Jacobi orderings’, SIAM J. Sci. Statist. Comput. 10(1), 1826.CrossRefGoogle Scholar
Ma, S. C., Patrick, M. and Szyld, D. (1989), ‘A parallel, hybrid algorithm for the generalized eigenproblem’, in Parallel Processing for Scientific Computing, (Rodrigue, Garry, ed.), SIAM (Philadelphia), ch 16, 8286.Google Scholar
Madsen, N. K., Rodrigue, G. H. and Karush, J. I. (1976), ‘Matrix multiplication by diagonals on a vector/parallel processor’, Inform. Process. Lett. 5, 4145.CrossRefGoogle Scholar
Malyshev, A. N. (1991), ‘Parallel aspects of some spectral problems in linear algebra’, Dept of Numerical Mathematics Report NM-R9113, Centre for Mathematics and Computer Science, Amsterdam.Google Scholar
Mehrmann, V. (1991), ‘Divide and conquer methods for block tridiagonal systems’, Technical Report Bericht Nr. 68, Inst. fuer Geometrie und Prakt. Math., Aachen.Google Scholar
Meier, U. (1985), ‘A parallel partition method for solving banded systems of linear equations’, Parallel Comput. 2, 3343.CrossRefGoogle Scholar
Meijerink, J. A. and van der Vorst, H. A. (1977), ‘An iterative solution method for linear systems of which the coefficient matrix is a symmetric M-matrix’, Math. Comput. 31, 148162.Google Scholar
Meijerink, J. A. and van der Vorst, H. A. (1981), ‘Guidelines for the usage of incomplete decompositions in solving sets of linear equations as they occur in practical problems’, J. Comput. Phys. 44, 134155.CrossRefGoogle Scholar
Meurant, G. (1984a), ‘The block preconditioned conjugate gradient method on vector computers’, BIT 24, 623633.CrossRefGoogle Scholar
Meurant, G. (1984b), ‘Numerical experiments for the preconditioned conjugate gradient method on the CRAY X-MP/2’, Technical Report LBL-18023, University of California, Berkeley, CA.Google Scholar
Michielse, P. H. and van der Vorst, H. A. (1988), ‘Data transport in Wang's partition method’, Parallel Comput. 7, 8795.CrossRefGoogle Scholar
Mu, M. and Rice, J.R. (1992), ‘A grid based subtree-subcube assignment strategy for solving partial differential equations on hypercubes’, SIAM J. Sci. Statist. Comput. 13, 826839.CrossRefGoogle Scholar
Ortega, J.M. (1988), Introduction to Parallel and Vector Solution of Linear Systems, Plenum Press (New York and London).CrossRefGoogle Scholar
Paardekooper, M.H.C. (1989), ‘A quadratically convergent parallel Jacobi process for diagonally dominant matrices with distinct eigenvalues’, J. Comput. Appl. Math. 27, 316.CrossRefGoogle Scholar
Paige, C. C. (1976), ‘Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix’, J. Inst. Math. Appl. 18, 341349.CrossRefGoogle Scholar
Pandey, P., Kenney, C. and Laub, A. (1990), ‘A parallel algorithm for the matrix sign function’, Int. J. High Speed Comput. 2(2), 181191.CrossRefGoogle Scholar
Parlett, B. N. (1980), The Symmetric Eigenvalue Problem, Prentice-Hall (Englewood Cliffs, NJ).Google Scholar
Parlett, B. (1992a), private communication.Google Scholar
Parlett, B. (1992b), ‘Reduction to tridiagonal form and minimal realizations’, SIAM J. Math. Anal. Appl. 13(2), 567593.CrossRefGoogle Scholar
Parlett, B. and Fernando, V. (1992), ‘Accurate singular values and differential QD algorithms’, Math. Department PAM-554, University of California, Berkeley, CA.Google Scholar
Parlett, B. N. and Reid, J. K. (1981), ‘Tracking the progress of the Lanczos algorithm for large symmetric eigenproblems’, IMA J. Numer. Anal. 1, 135155.CrossRefGoogle Scholar
Petiton, S. G. (1992), ‘Parallel subspace method for non-Hermitian eigenproblems on the Connection Machine (CM2)’, Appl. Numer. Math. 10.CrossRefGoogle Scholar
Pommerell, C. (1992), Solution of large unsymmetric systems of linear equations, PhD thesis, Swiss Federal Institute of Technology, Zürich.Google Scholar
Pothen, A., Jha, S. and Vemulapati, U. (1987), ‘Orthogonal factorization on a distributed memory multiprocessor’, in Hypercube Multiprocessors 1987, Knoxville, TN, SIAM (Philadelphia), 587598.Google Scholar
Priest, D. (1991), ‘Algorithms for arbitrary precision floating point arithmetic’, in Proc. 10th Symp. Computer Arithmetic, Grenoble, France, (Kornerup, P. and Matula, D., eds), IEEE Computer Society Press (New York), 132145.Google Scholar
di Brozolo, G. Radicati and Robert, Y. (1989), ‘Parallel conjugate gradient-like algorithms for solving sparse nonsymmetric linear systems on a vector multiprocessor’, Parallel Comput. 11, 223239.CrossRefGoogle Scholar
Robert, Y. (1990), The Impact of Vector and Parallel Architectures on the Gaussian Elimination Algorithm, Wiley (New York).Google Scholar
Roberts, J. (1980), ‘Linear model reduction and solution of the algebraic Riccati equation’, Int. J. Control 32, 677687.CrossRefGoogle Scholar
Romine, C. and Ortega, J. (1988), ‘Parallel solution of triangular systems of equations’, Parallel Comput. 6, 109114.CrossRefGoogle Scholar
Rothberg, E. and Gupta, A. (1989), ‘Fast sparse matrix factorization on modern workstations’, Technical Report STAN-CS-89–1286, Stanford University, Stanford, California.CrossRefGoogle Scholar
Saad, Y. (1980), ‘Variations on Arnoldi's method for computing eigenelements of large unsymmetric matrices’, Lin. Alg. Appl. 34, 269295.CrossRefGoogle Scholar
Saad, Y. (1985a), ‘Partial eigensolutions of large nonsymmetric matrices’, Technical Report, Dept. of Comp. Science, New Haven, CN.Google Scholar
Saad, Y. (1985b), ‘Practical use of polynomial preconditionings for the conjugate gradient method’, SIAM J. Sci. Statist. Comput. 6, 865881.CrossRefGoogle Scholar
Saad, Y. (1988), ‘Krylov subspace methods on supercomputers’, Technical report, RIACS, Moffett Field, CA.Google Scholar
Saad, Y. and Schultz, M. H. (1985), ‘Conjugate Gradient-like algorithms for solving nonsymmetric linear systems’, Math. Comput. 44, 417424.CrossRefGoogle Scholar
Sameh, A. (1971), ‘1On Jacobi and Jacobi-like algorithms for a parallel computer’, Math. Comput. 25, 579590.CrossRefGoogle Scholar
Sameh, A. (1985), ‘On some parallel algorithms on a ring of processors’, Comput. Phys. Commun. 37, 159166.CrossRefGoogle Scholar
Sameh, A. and Brent, R. (1977), ‘Solving triangular systems on a parallel computer’, SIAM J. Numer. Anal. 14, 11011113.CrossRefGoogle Scholar
Schlichting, J. J. F. M. and van der Vorst, H. A. (1987), ‘Solving bidiagonal systems of linear equations on the CDC CYBER 205’, Technical Report NM-R8725, CWI, Amsterdam, the Netherlands.Google Scholar
Schlichting, J. J. F. M. and van der Vorst, H. A. (1989), ‘Solving 3D block bidiagonal linear systems on vector computers’, J. Comput. Appl. Math. 27, 323330.CrossRefGoogle Scholar
Schreiber, R. and Van Loan, C. (1989), ‘A storage efficient WY representation for products of Householder transformations’, SIAM J. Sci. Statist. Comput. 10, 5357.CrossRefGoogle Scholar
Seager, M. K. (1986), ‘Parallelizing conjugate gradient for the CRAY X-MP’, Parallel Comput. 2, 35–7.CrossRefGoogle Scholar
Shroff, G. (1991), ‘A parallel algorithm for the eigenvalues and eigenvectors of a general complex matrix’, Numer. Math. 58, 779805.CrossRefGoogle Scholar
Shroff, G. and Schreiber, R. (1989), ‘On the convergence of the cyclic Jacobi method for parallel block orderings’, SIAM J. Math. Anal. Appl. 10, 326346.CrossRefGoogle Scholar
Simon, H. (1989), ‘Bisection is not optimal on vector processors’, SIAM J. Sci. Statist. Comput. 10(1), 205209.CrossRefGoogle Scholar
Slapničar, I. (1992), Accurate symmetric eigenreduction by a Jacobi method, PhD thesis, Fernuniversität – Hagen, Hagen, Germany.CrossRefGoogle Scholar
Smith, B. T., Boyle, J. M., Dongarra, J. J., Garbow, B. S., Ikebe, Y., Klema, V. C. and Moler, C. B. (1976), Matrix Eigensystem Routines – EISPACK Guide: Springer Lecture Notes in Computer Science 6, Springer-Verlag (Berlin).CrossRefGoogle Scholar
Sorensen, D. (1992), ‘Implicit application of polynomial filters in a k-step Arnoldi method’, SIAM J. Math. Anal. Appl. 13(1), 357385.CrossRefGoogle Scholar
Sorensen, D. and Tang, P. (1991), ‘On the orthogonality of eigenvectors computed by divide-and-conquer techniques’, SIAM J. Numer. Anal. 28(6), 17521775.CrossRefGoogle Scholar
Stewart, G. W. (1985), ‘A Jacobi-like algorithm for computing the Schur decomposition of a non-Hermitian matrix’, SIAM J. Sci. Statist. Comput. 6, 853864.CrossRefGoogle Scholar
Stewart, G. W. (1987), ‘A parallel implementation of the QR algorithm’, Parallel Comput. 5, 187196.CrossRefGoogle Scholar
Stickel, E. (1991), ‘Separating eigenvalues using the matrix sign function’, Lin. Alg. Appl. 148, 7588.CrossRefGoogle Scholar
Stone, H. S. (1973), ‘An efficient parallel algorithm for the solution of a tridiagonal linear system of equations’, J. Assoc. Comput. Mach. 20, 2738.CrossRefGoogle Scholar
Swarztrauber, P. (1992), ‘A parallel algorithm for computing the eigenvalues of a symmetric tridiagonal matrix’, Math. Comput. to appear.Google Scholar
Thinking Machines Corporation (1987), Connection Machine Model CM-2 Technical Summary, IBM.Google Scholar
van de Geijn, R. (1987), Implementing the QR algorithm on an array of processors, PhD thesis, University of Maryland, College Park, Computer Science Department Report TR-1897.Google Scholar
van de Geijn, R. and Hudson, D. (1989), ‘Efficient parallel implementation of the nonsymmetric QR algorithm’, in Hypercube Concurrent Computers and Applications, (Gustafeon, J., ed.), ACM (New York).Google Scholar
Van de Velde, E. (1992), Introduction to Concurrent Scientific Computing, Caltech (Pasadena).Google Scholar
van der Vorst, H. A. (1982), ‘A vectorizable variant of some ICCG methods’, SIAM J. Sci. Statist. Comput. 2, 8692.Google Scholar
van der Vorst, H. A. (1986), ‘The performance of Fortran implementations for preconditioned conjugate gradients on vector computers’, Parallel Comput. 2, 4958.CrossRefGoogle Scholar
van der Vorst, H. A. (1987a), ‘Analysis of a parallel solution method for tridiagonal linear systems’, Parallel Comput. 5, 303311.CrossRefGoogle Scholar
van der Vorst, H. A. (1987b), ‘Large tridiagonal and block tridiagonal linear systems on vector and parallel computers’, Parallel Comput. 5, 4554.CrossRefGoogle Scholar
van der Vorst, H. A. (1989a), ‘High performance preconditioning’, SIAM J. Sci. Statist. Comput. 10, 11741185.CrossRefGoogle Scholar
van der Vorst, H. A. (1989b), ‘ICCG and related methods for 3D problems on vector computers’, Comput. Phys. Commun. 53, 223235.CrossRefGoogle Scholar
van der Vorst, H. A. (1989c), ‘Practical aspects of parallel scientific computing’, Future Gener. Comput. Sys. 4, 285291.CrossRefGoogle Scholar
van der Vorst, H. A. and Dekker, K. (1989), ‘Vectorization of linear recurrence relations’, SIAM J. Sci. Statist. Comput. 10, 2735.CrossRefGoogle Scholar
van der Vorst, H. A. and Vuik, C. (1991), ‘GMRESR: A family of nested GMRES methods’, Technical Report 91–80, Delft University of Technology, Faculty of Technical Mathematics.Google Scholar
Veselić, K. (1979), ‘A quadratically convergent Jacobi-like method for real matrices with complex conjugate eigenvalues’, Numer. Math. 33, 425435.CrossRefGoogle Scholar
Walker, H. F. (1988), ‘Implementation of the GMRES method using Householder transformations’, SIAM J. Sci. Statist. Comput. 9, 152163.CrossRefGoogle Scholar
Wang, H. H. (1981), ‘A parallel method for tridiagonal equations’, ACM Trans. Math. Soft. 7, 170183.CrossRefGoogle Scholar
Watkins, D. (1992), ‘Shifting strategies for the parallel QR algorithm’, Dept of Pure and Applied Mathematics report, Washington State Univ., Pullman, WA.Google Scholar
Watkins, D. and Elsner, L. (1991), ‘Convergence of algorithms of decomposition type for the eigenvalue problem’, Lin. Alg. Appl. 143, 1947.CrossRefGoogle Scholar
Wen, C.-P. and Yelick, K. (1992), ‘A survey of message passing systems’, Computer science division, University of California, Berkeley, CA.Google Scholar
Wilkinson, J. H. (1965), The Algebraic Eigenvalue Problem, Oxford University Press (Oxford).Google Scholar
Zeng, Z. (1991), Homotopy-determinant algorithm for solving matrix eigenvalue problems and its parallelizations, PhD thesis, Michigan State University.Google Scholar
Zima, H. and Chapman, B. (1991), Supercompilers for Parallel and Vectors Computers, ACM (New York).Google Scholar
Zmijewski, E. (1989), ‘Limiting communication in parallel sparse Cholesky factorization’, Technical Report TRCS89–18, Department of Computer Science, University of California, Santa Barbara, CA.Google Scholar