High detection complexity is known to be one of the major challenges in MIMO communications based on spatial multiplexing. Tuple Search Detector (TSD) was recently introduced, significantly reducing detection complexity in comparison to conventional algorithms while achieving close to full max-log-APP BER performance. Irregular control flow and sequential nature of depth-first-based detectors frustrate efficient application of parallelization techniques, typically leading to inefficient realizations. This work presents a novel TSD implementation, based on a scalable and parallelizable pipelined ASIP architecture. The proposed VLSI design is implemented for 4×4 MIMO transmission using 64-QAM constellation on 65-nm CMOS technology. In low SNR scenarios, proposed detector achieves 403.6 Mbps throughput at 454 MHz clock frequency. TSD can be moreover adjusted according to transmission conditions, reaching >1 Gbps. A silicon area of 0.14 mm 2 (98.9 kGEs) is occupied by the TSD core, reporting low power dissipation (57.94 mW) under typical case operating conditions. Proposed detector implementation achieves close to full max-log-APP BER performance and high detection throughput with reasonable hardware complexity, by far outperforming state-of-the-art realizations.