This paper presents the high speed and low power design of a turbo decoder with parallel architecture. To solve the memory conflict problem of extrinsic information in such parallel architectures, a two-level mapping approach is proposed for designing a collision-free parallel interleaver. Since the warm-up process in the parallel architecture increases the decoding delay, a new parallel architecture without warm-up is proposed for high speed applications. The proposed parallel architecture increases decoding speed by 6-50% for a 16-parallel decoder. To reduce the power consumption of the decoder with parallel architecture, a simple truncation approach is proposed to reduce the storage requirement of the extrinsic information and path metrics without any extra hardware cost. The proposed truncation approach reduces the power consumption with little performance degradation. 0-7803-8834-8/05/$20.00 ©2005 IEEE.