O p t i m a l Software M u l t i c a s t in W o r m h o l e -R o u t e d M u l t i s t a g e N e t w o r k s *Hong X u I n f o r m a t i o n S c i e n c e s I n s t i t u t e U n i v e r s i t y of S o u t h e r n C a l i f o r n i a M a r i n a del R e y , C A 90292-6695 A b s t r a c t Multistage interconnection networks are a popular class of interconnection architecture for constructing sealable parallel computers (SPCs). The focus of this paper is on wormhole routed multistage networks supporting turnaround routing. Existing machines characterized by such a system model include the IBM SP-1, TMC CM-5, and Meiko CS-2.Efficient collective communication among processor nodes is critical to the performance of SPCs. A system-level multicast service, in which the same message is delivered from a source node to an arbitrary number of destination nodes, is fundamental in supporting collective communication primitives including the application-level broadcast, reduction, and barrier synchronization. This paper addresses how to efficiently implement multicast services in wormholerouted multistage networks, in the absence of hardware multieast support, by exploiting the properties of the switching technology. An optimal multicast algorithm is proposed. The results of implementations on a 64-node SP-1 show that the proposed algorithm significantly outperforms the application-level broadcast primitives provided by currently existing collective communication libraries including the public domain MPI.