Cellular network configuration plays a critical role in network performance. In current practice, network configuration depends heavily on field experience of engineers and often remains static for a long period of time. This practice is far from optimal. To address this limitation, online-learning-based approaches have great potentials to automate and optimize network configuration. Learningbased approaches face the challenges of learning a highly complex function for each base station and balancing the fundamental exploration-exploitation tradeoff while minimizing the exploration cost. Fortunately, in cellular networks, base stations (BSs) often have similarities even though they are not identical. To leverage such similarities, we propose kernel-based multi-BS contextual bandit algorithm based on multi-task learning. In the algorithm, we leverage the similarity among different BSs defined by conditional kernel embedding. We present theoretical analysis of the proposed algorithm in terms of regret and multi-task-learning efficiency. We evaluate the effectiveness of our algorithm based on a simulator built by real traces.
Figure 1: Cellular networkOnline-learning-based cellular BS configuration faces multiple challenges. First, the mapping between network configuration and performance is highly complex. Since different BSs have different deployment environments, they have different mappings between network configuration and performance, given a BS condition. Furthermore, for a given BS, its condition also changes over time due to network dynamics, leading to different optimal configurations at different points in time. In addition, for a given BS and given condition, the impact of network configuration on performance is too complicated to model using white-box analysis due to the complexity and dynamics of network environment, user diversity, traffic demand, mobility, etc. Second, to learn this mapping and to optimize the network performance over a period of time, operators face a fundamental exploitation-exploration tradeoff: in this (m) t ∈ R d . The state may include the number of users in a BS, user mobility, traffic demand, and neighboring BS