<div>Detection for high-dimensional multiple-input multiple-output (MIMO) and Massive MIMO (MMIMO) systems is an active field of research in wireless communications. While most works consider spatially uncorrelated channels, practical MMIMO channels are correlated. This paper investigates the impact of correlation on Sphere Decoder (SD), not only for Single-User (SU) but also for Multi-User (MU) scenarios. The complexity of SD is mainly determined by the Initial Radius (IR) method and the number of visited nodes during detection. This paper proposes both an efficient IR and a new metric constraint in the tree searching algorithm, that significantly decrease the number of visited nodes and render SD feasible for large-scale systems. In addition, a hardware implementation featured with a one-node-per-cycle architecture, minimizes the latency of the detection process. Trade-offs between bit error rate (BER) performance and computational complexity are presented, either modifying the backtracking mechanism or limiting the number of radius updates. Simulation results prove that the proposed optimizations are effective for both correlated and uncorrelated channels, regardless the level of noise. The decoding gain of SD compared to the low-complexity Linear Detectors (LD) is higher in the presence of correlation than in the uncorrelated case. However, as expected, spatial correlation adversely affects the performance and the complexity of SD. Simulation results reported here also confirm that correlation at the side equipped with more antennas is less detrimental. Hardware aspects are examined for both a Virtex-7 FPGA device and a 28-nm ASIC technology.</div>