Quadratic permutation polynomial (QPP) interleaver is more suitable for parallel turbo decoding due to it is contention-free. However, the parallel address generation of QPP is area-consuming when the parallel degree P is large, and the data shuffle between memory banks and processing elements (PE) introduces large interconnect cost. This paper first evaluates the area and power cost of three typical Parallel Address Generators (PAG) and four typical Data Shuffle Networks (DSN) from academic and industrial area, and then proposes a novel general QPP interleaver with a highly area-efficient PAG and an associated DSN. Our QPP interleaver can support general parallel turbo decoder design. Experimental results show that, for P =64, the area and power cost of the PAG are on average 9.2% and 9.8% of that of the evaluated respectively. Meanwhile, the DSN can also achieve a slight hardware cost reduction, compared with the evaluated works.