MAC layers of today's wireless systems have evolved into complex state-machines, and are characterized by strict timeliness requirements and high computational demands. MAC protocols are generally viewed as sequential extended state-machines and therefore MAC parallelization has not yet been widely considered. In this paper, we show that MAC execution efficiency can be substantially increased by exploiting parallelism and by providing the necessary hardware-software support. In particular, we enable dual-processor interrupt-driven hardware architecture and support a customized real-time operating system kernel on the widely used WARP SDR platform. Moreover, we integrate it with our framework for composing MAC protocols based on their elementary functionalities. We describe the architectural details of the system and discuss strategies for efficient scheduling of different MAC processes. We evaluate our system in realistic application test-cases. Our empirical results show that by exploiting parallelism, our system achieves significant improvements in MAC execution speed compared to the contemporary sequential implementation approach.