Abstract. We present a new parallel implementation of a divide and conquer algorithm for computing the spectral decomposition of a symmetric tridiagonal matrix on distributed memory architectures. The implementation we develop differs from other implementations in that we use a two-dimensional block cyclic distribution of the data, we use the Löwner theorem approach to compute orthogonal eigenvectors, and we introduce permutations before the back transformation of each rank-one update in order to make good use of deflation. This algorithm yields the first scalable, portable, and numerically stable parallel divide and conquer eigensolver. Numerical results confirm the effectiveness of our algorithm. We compare performance of the algorithm with that of the QR algorithm and of bisection followed by inverse iteration on an IBM SP2 and a cluster of Pentium PIIs.Key words. divide and conquer, symmetric eigenvalue problem, tridiagonal matrix, rank-one modification, parallel algorithm, ScaLAPACK, LAPACK, distributed memory architecture AMS subject classifications. 65F15, 68C25PII. S10648275983369511. Introduction. The divide and conquer algorithm for the symmetric tridiagonal eigenvalue problem was first developed by Cuppen [8], based on previous ideas of Golub [16] and Bunch, Nielsen, and Sorensen [5] for the solution of the secular equation. The algorithm was popularized as a practical parallel method by Dongarra and Sorensen [14], who implemented it on a shared memory machine. They concluded that divide and conquer algorithms, when properly implemented, can be many times faster than traditional ones, such as bisection followed by inverse iteration or the QR algorithm, even on serial computers. Later parallel implementations had mixed success. Using an Intel iPSC-1 hypercube, Ipsen and Jessup [22] found that their bisection implementation was more efficient than their divide and conquer implementation because of the excessive amount of data transferred between processors and unbalanced work load after the deflation process. More recently, Gates and Arbenz [15] showed that good speed-up can be achieved from distributed memory parallel implementations. However, they did not use techniques described in [18] that guarantee the orthogonality of the eigenvectors and that make good use of the deflation to speed the computation.In this paper, we describe an efficient, scalable, and portable parallel implementation for distributed memory machines of a divide and conquer algorithm for the