In large-scale multiple-input multiple-output (MIMO) systems, the computational complexity of linear data detection poses a significant challenge. This paper addresses this issue by focusing on enhancing detection performance through the optimization of the circuit structure based on the Neumann series approximation method. While using a limited number of Neumann series terms for approximate matrix inversion proves to be an effective method, the associated low complexity requires improvement. This paper proposes a novel architecture that enhances the cell distribution structure of Gram matrix computing cells. This improvement results in an objectively low-complexity approximation inversion of the pulsating array, with simulation results on FPGA supporting the theoretical findings. The inversion in the approximation matrix is done on Xilinx Virtex-7 FPGA for a system with 128 BS antennas for receiving the data of 8 single-antenna users; Slice LUTS and Slice Registers account for only roughly 19.53% and 27.17%, respectively.