Load Balancers (LBs) play a critical role in managing the performance and resource utilization of distributed systems. However, developing efficient LBs for large, distributed clusters is challenging for several reasons: (i) large clusters require numerous scheduling decisions per second, (ii) such clusters typically consist of heterogeneous servers that widely differ in their computing power, and (iii) such clusters often experience significant changes in load. In this paper we propose HALO, a class of scalable, heterogeneity-aware LBs for cluster systems. HALO LBs are based on simple randomized algorithms that are analytically optimized for heterogeneity. We develop HALO for randomized, Round-Robin, and Power-of-D LBs. We illustrate the benefits of HALO and demonstrate its superiority over other comparable LBs using analytical, simulation, and (Apache-based) implementation results. Our results show that HALO LBs provide significantly lower response times without incurring additional overhead across a wide range of scenarios.