This paper addresses the design of parallel architectures for computing the 8x8 Discrete Cosine Transform (DCT). It concentrates on direct methods, which avoid a row-column decomposition.Two novel multiplier-free parallel architectures for high-speed 8x8 DCT calculation are proposed. The first architecture, which uses polynomial transforms, is compared with a second architecture, which computes the DCT via the Walsh-Hadamard Transform (WHT). Both architectures achieve a high degree of parallelism and regularity. The proposed architectures are designed for HDTV sampling rates and can be emaently realized in CMOS technology.