This paper presents the first silicon-proven implementation of a lattice reduction (LR) algorithm, which achieves maximum likelihood diversity. The implementation is based on a novel hardware-optimized due to the Lenstra, Lenstra, and Lovász (LLL) algorithm, which significantly reduces its complexity by replacing all the computationally intensive LLL operations (multiplication, division, and square root) with low-complexity additions and comparisons. The proposed VLSI design utilizes a pipelined architecture that produces an LR-reduced matrix set every 40 cycles, which is a 60% reduction compared to current state-of-the-art LR field-programmable gate array implementations. The 0.13-µm CMOS LR core presented in this paper achieves a clock rate of 352 MHz, and thus is capable of sustaining a throughput of 880 Mb/s for 64-QAM multipleinput-multiple-output detection with superior performance while dissipating 59.4 mW at 1.32 V supply.Index Terms-Application-specific integrated circuit (ASIC) design, due to Lenstra, Lenstra, and Lovász (LLL) algorithm, lattice reduction, multiple-input-multiple-output (MIMO) detection, Seysen's algorithm.