Switch-Memory-Switch (SMS) architecture exhibits an excellent performance due to its emulating the Output Queueing structure. However, in order to achieve the maximal matching, the first stage scheduling operates at a huge computational complexity, which blocks the SMS from practical implementation. In order to put SMS into more effective industrial applications, especially in super-large size switches/routers with multi-services environment, two parallel iterative scheduling algorithms, named IS-RRM and AIS-RRM respectively, are proposed in this paper. The algorithms abolish totally the traditional departure-time-compatible (DTC) graph, and by using iteration-sharing technology, greatly reduce the required iteration number in each time slot. Using a discrete-time Markov chain to model the AIS-RRM algorithm, we obtain its upper bound of cell loss rate. Meanwhile, experimental and theoretical results show that so long as the number of shared memories is twice the switch size, AIS-RRM algorithm can achieve a cell loss rate of 10 -8 when the input buffer size is 15 and the iteration number of each time slot is 6, despite the arrival traffic pattern and the switch size. Furthermore, the iteration number required in each time slot can be further decreased by increasing the input buffer size.