This paper presents an efficient parallel architecture for implementation of a constant modulus algorithm (CMA) adaptive array antenna. By inserting delay units into the original CMA, a novel delayed CMA (DCMA) that can significantly reduce the associated critical path is derived. Consequently, a pipelining architecture that supports parallel processing is introduced for implementation of the DCMA. In addition to the pipelining technique, a power-of-two multiplier is proposed for the DCMA leading to the efficient FPGA implementation. The effects of delays and finite word-length on the convergence property of CMA are investigated via simulations. Moreover, the synthesized results demonstrate that FPGA implementation of the proposed architecture using power-of-two arithmetic achieves 26.9% resource reduction in comparison with that of fixed-point arithmetic and operating clock frequency higher than 65 MHz. The implemented FPGA was tested to confirm that the designed architecture operates well for CMA.