This paper proposes a cooperative multi-agent online reinforcement learning-based (CO-MORL) bias offset (BO) control scheme for cell range expansion (CRE) in dense heterogeneous networks (HetNets). The proposed COMORL scheme controls BOs for CRE to maximize the number of user equipments (UEs) that satisfy their quality of service (QoS) requirements, especially in terms of delay and data rates. For this purpose, we develop a QoS satisfaction indicator that measures a violation of delay requirements by considering both QoS requirements and signal-to-interference-plus-noise ratio (SINR). Also, we formulate a Markov decision process (MDP) model and solve it with a cooperative multi-agent online reinforcement learning algorithm. The proposed COMORL scheme maximizes a global utility for load-coupled base stations. Our simulation results validate the proposed COMORL scheme's effectiveness in terms of throughput, delay satisfaction ratio, and fairness. Specifically, we verify that the proposed COMORL scheme achieves a maximum of approximately 27% and 30% improvement of the delay satisfaction ratio, which is how many UEs satisfy their delay requirement over whole UEs in a serving BS, under medium and full traffic load, respectively, in a dynamic scenario, compared to the Max-SINR scheme.INDEX TERMS HetNets, cell range expansion, load balancing, QoS, cooperative multi-agent reinforcement learning.