Routers are ubiquitous in modern computing, appearing in wide-area networks, multiprocessor servers, and data storage systems. Modern routers achieve high performance by solving computationally intensive tasks using custom hardware. One of the most challenging problems in designing a high-end router is scheduling the transfer of packets from inputs to outputs.We present a simple and near optimal randomized parallel scheduling algorithm for scheduling packets in routers based on the Switch-Memory-Switch (SMS) architecture, which emulates 'output queuing' by using a collection of small memories within the switch to buffer packets, and which forms the basis of the fastest routers in use today. Specifically, for a router with inputs and outputs, our algorithm computes the schedule inrounds, where a round is a communication of a few bits between input ports and memory together with simple local computation at the inputs and memory. Furthermore, by using andeep pipeline at each input, our algorithm computes the schedule in a constant number of rounds. Our pipelined algorithm is quite simple and achieves optimal (i.e., constant) throughput with a tiny ¡ £ ¢ ¤ § ¦ © delay. We show that the total amount of buffer memory required by our algorithm is close to the minimum required. We also show that the number of buffer memories is within an is the minimum number of memories needed under adversarial placement of packets. Furthermore we show that the number of extra memories that we use over the minimum of that is required in the offline version, is within a constant factor of the minimum required by any on-line scheduler, even if that scheduler is allowed to fail occasionally.Our scheduling algorithm is randomized and works with high probability in . We also prove that it has the 'self-stabilizing' property, i.e., it resumes its normal behavior if occasional lapses occur due to the probabilistic nature of the algorithm.