The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers

Bai, Zhaojun; Demmel, James; Dongarra, Jack; Petitet, Antoine; Robinson, Haakon; Stanley, K.

doi:10.1137/s1064827595281368

Cited by 33 publications

(40 citation statements)

References 25 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance evaluation of the new algorithms on massively parallel machines, such as the Intel Delta and Thinking Machines CM-5, will appear in [9].…”

Section: Discussionmentioning

confidence: 99%

An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems

Bai

Demmel

1997

Numerische Mathematik

115

164

View full text Add to dashboard Cite

Summary.We discuss an inverse-free, highly parallel, spectral divide and conquer algorithm. It can compute either an invariant subspace of a nonsymmetric matrix A, or a pair of left and right deflating subspaces of a regular matrix pencil A − λB . This algorithm is based on earlier ones of Bulgakov, Godunov and Malyshev, but improves on them in several ways. This algorithm only uses easily parallelizable linear algebra building blocks: matrix multiplication and QR decomposition, but not matrix inversion. Similar parallel algorithms for the nonsymmetric eigenproblem use the matrix sign function, which requires matrix inversion and is faster but can be less stable than the new algorithm. Mathematics Subject Classification (1991): 65F15

show abstract

“…The performance evaluation of the new algorithms on massively parallel machines, such as the Intel Delta and Thinking Machines CM-5, will appear in [9].…”

Section: Discussionmentioning

confidence: 99%

An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems

Bai

Demmel

1997

Numerische Mathematik

115

164

View full text Add to dashboard Cite

show abstract

“…We illustrate the main idea with s = 8. Figure 4 gives the flowchart (t 1 , t 2 ], we compute all the R factors {R (0) n (1) } ofX n (1) (marked by X− R), and the R factors {R (1) …”

Section: Load Balancingmentioning

confidence: 99%

“…(Recall that R n (0) = R g(0) and S 0 = S 0 .) Also R n (1) and R n (2) will be formed at Processors 4 and 6 respectively (see the marked circles). Once R n (k) are formed, they can be combined with previously obtained R g(k−1) to form R g(k) by using (22), provided that R g(k−1) are sent there from the previous time-step (marked by curve arrows in the figure).…”

Section: Load Balancingmentioning

confidence: 99%

“…There is a growing interest in this topic of distributed data sets and here are some relevant works in the literature: the interaction of huge data sets and the limits of computational feasibility in Wegman [12], parallel methods for spectral decomposition of nonsymmetric Current Address: Department of Mathematics, National University of Singapore, 2 Science Drive 2, Singapore 117543 (matbzj@nus.edu.sg). matrix on distributed memory processors in Bai et al [1], an efficient out-of-core SVD algorithm in Rabani et al [11], an algorithm for data distributed by blocks of columns in Kargupta et al [7], and a method for massive data sets distributed by blocks of rows in Qu et al [10].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Principal Component Analysis for Distributed Data Sets with Updating

Bai

Chan

Luk

2005

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract.Identifying the patterns of large data sets is a key requirement in data mining. A powerful technique for this purpose is the principal component analysis (PCA). PCA-based clustering algorithms are effective when the data sets are found in the same location. In applications where the large data sets are physically far apart, moving huge amounts of data to a single location can become an impractical, or even impossible, task. A way around this problem was proposed in [10], where truncated singular value decompositions (SVDs) are computed locally and used to reduce the communication costs. Unfortunately, truncated SVDs introduce local approximation errors that could add up and would adversely affect the accuracy of the final PCA. In this paper, we introduce a new method to compute the PCA without incurring local approximation errors. In addition, we consider the situation of updating the PCA when new data arrive at the various locations.

show abstract

“…In dynamic splitting, a task list is used to keep track of the various parts of the matrix during the decomposition process and to make use of data and task parallelism. This approach has been investigated 1 for the parallel implementation of the spectral divide and conquer algorithm for the unsymmetric eigenvalue problem using the matrix sign function [2]. However, we did not choose this approach, because in the symmetric case the partitioning of the matrix can be done arbitrarily and we prefer to take advantage of this opportunity.…”

Section: Initial Splittingmentioning

confidence: 99%

A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures

Tisseur¹,

Dongarra²

1999

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

Abstract. We present a new parallel implementation of a divide and conquer algorithm for computing the spectral decomposition of a symmetric tridiagonal matrix on distributed memory architectures. The implementation we develop differs from other implementations in that we use a two-dimensional block cyclic distribution of the data, we use the Löwner theorem approach to compute orthogonal eigenvectors, and we introduce permutations before the back transformation of each rank-one update in order to make good use of deflation. This algorithm yields the first scalable, portable, and numerically stable parallel divide and conquer eigensolver. Numerical results confirm the effectiveness of our algorithm. We compare performance of the algorithm with that of the QR algorithm and of bisection followed by inverse iteration on an IBM SP2 and a cluster of Pentium PIIs.Key words. divide and conquer, symmetric eigenvalue problem, tridiagonal matrix, rank-one modification, parallel algorithm, ScaLAPACK, LAPACK, distributed memory architecture AMS subject classifications. 65F15, 68C25PII. S10648275983369511. Introduction. The divide and conquer algorithm for the symmetric tridiagonal eigenvalue problem was first developed by Cuppen [8], based on previous ideas of Golub [16] and Bunch, Nielsen, and Sorensen [5] for the solution of the secular equation. The algorithm was popularized as a practical parallel method by Dongarra and Sorensen [14], who implemented it on a shared memory machine. They concluded that divide and conquer algorithms, when properly implemented, can be many times faster than traditional ones, such as bisection followed by inverse iteration or the QR algorithm, even on serial computers. Later parallel implementations had mixed success. Using an Intel iPSC-1 hypercube, Ipsen and Jessup [22] found that their bisection implementation was more efficient than their divide and conquer implementation because of the excessive amount of data transferred between processors and unbalanced work load after the deflation process. More recently, Gates and Arbenz [15] showed that good speed-up can be achieved from distributed memory parallel implementations. However, they did not use techniques described in [18] that guarantee the orthogonality of the eigenvectors and that make good use of the deflation to speed the computation.In this paper, we describe an efficient, scalable, and portable parallel implementation for distributed memory machines of a divide and conquer algorithm for the

show abstract

The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers

Cited by 33 publications

References 25 publications

An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems

An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems

Principal Component Analysis for Distributed Data Sets with Updating

A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures

Contact Info

Product

Resources

About