SUMMARYIn this paper we present two versions of a parallel algorithm to solve the block-Toeplitz least-squares problem on distributed-memory architectures. We derive a parallel algorithm based on the seminormal equations arising from the triangular decomposition of the product T T T . Our parallel algorithm exploits the displacement structure of the Toeplitz-like matrices using the Generalized Schur Algorithm to obtain the solution in O(mn) flops instead of O(mn 2 ) flops of the algorithms for non-structured matrices. The strong regularity of the previous product of matrices and an appropriate computation of the hyperbolic rotations improve the stability of the algorithms. We have reduced the communication cost of previous versions, and have also reduced the memory access cost by appropriately arranging the elements of the matrices. Furthermore, the second version of the algorithm has a very low spatial cost, because it does not store the triangular factor of the decomposition. The experimental results show a good scalability of the parallel algorithm on two different clusters of personal computers.