The impressive developments of high performance digital CMOS technologies in the last years have made possible the implementation of very complex algorithms in low cost chips. Digital signal processors, programmable devices such as FPGAs and VLSI technologies have come to the point where the computing power and the memory required to execute several real time applications can be incorporated even in cheap portable devices. Among the several application fields that have been strongly developed by this technology progress, channel decoding is one of the most significant and interesting: as predicted by Shannon, the achievement of performance close to the theoretical limit implies the adoption of high complexity algorithms, so making the search for good decoding architecture a challenging and fascinating problem for many digital architects and VLSI designers. There are a number of aspects of turbo codes that make them so interesting also from an implementation point of view: first of all the algorithms that are used to implement the MAP decoding (such as the so called BCJR algorithm) are of great complexity; this complexity, coupled with the iterative nature of the whole decoding process, makes very difficult the accomplishment of throughput, latency, cost and energy dissipation constraints imposed by several modern multimedia and interactive applications. Moreover turbo decoders include large RAM memories that need to be organized and managed properly. Finally, the decoding algorithm can be parallelized in several ways, acting at different levels and the best solution for each application can only be selected by carefully exploring the space of design alternatives.The chapter reviews all main achievements obtained in the last 10 years in the hardware implementation of turbo codes and particularly the critical aspects that designers have to deal with. The main addressed problems are the following.•