Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems subject to specific constraints, which can be formulated as variable or functional optimization. If the objective and constraint functions of a variable optimization problem can be derived, standard numerical algorithms can be applied for finding the optimal solution, which however incur high computational cost when the dimension of the variable is high. To reduce the online computational complexity, learning the optimal solution as a function of the environment's status by deep neural networks (DNNs) is an effective approach. DNNs can be trained under the supervision of optimal solutions, which however, is not applicable to the scenarios without models or for functional optimization where the optimal solutions are hard to obtain. If the objective and constraint functions are unavailable, reinforcement learning can be applied to find the solution of a functional optimization problem, which is however not tailored to optimization problems in wireless networks. In this article, we introduce unsupervised and reinforced-unsupervised learning frameworks for solving both variable and functional optimization problems without the supervision of the optimal solutions. When the mathematical model of the environment is completely known and the distribution of environment's status is known or unknown, we can invoke unsupervised learning algorithm. When the mathematical model of the environment is incomplete, we introduce reinforcedunsupervised learning algorithms that learn the model by interacting with the environment. Our simulation results confirm the applicability of these learning frameworks by taking a user association problem as an example.
Data packet routing in aeronautical ad-hoc networks (AANETs) is challenging due to their high-dynamic topology. In this paper, we invoke deep reinforcement learning for routing in AANETs aiming at minimizing the end-to-end (E2E) delay. Specifically, a deep Q-network (DQN) is conceived for capturing the relationship between the optimal routing decision and the local geographic information observed by the forwarding node. The DQN is trained in an offline manner based on historical flight data and then stored by each airplane for assisting their routing decisions during flight. To boost the learning efficiency and the online adaptability of the proposed DQN-routing, we further exploit the knowledge concerning the system's dynamics by using a deep value network (DVN) conceived with a feedback mechanism. Our simulation results show that both DQN-routing and DVN-routing achieve lower E2E delay than the benchmark protocol, and DVN-routing performs similarly to the optimal routing that relies on perfect global information.
Predictive power allocation is conceived for energyefficient video streaming over mobile networks using deep reinforcement learning. The goal is to minimize the accumulated energy consumption of each base station over a complete video streaming session under the constraint that avoids video playback interruptions. To handle the continuous state and action spaces, we resort to deep deterministic policy gradient (DDPG) algorithm for solving the formulated problem. In contrast to previous predictive power allocation policies that first predict future information with historical data and then optimize the power allocation based on the predicted information, the proposed policy operates in an on-line and end-to-end manner. By judiciously designing the action and state that only depend on slowly-varying average channel gains, we reduce the signaling overhead between the edge server and the base stations, and make it easier to learn a good policy. To further avoid playback interruption throughout the learning process and improve the convergence speed, we exploit the partially known model of the system dynamics by integrating the concepts of safety layer, post-decision state, and virtual experiences into the basic DDPG algorithm. Our simulation results show that the proposed policies converge to the optimal policy that is derived based on perfect large-scale channel prediction and outperform the first-predictthen-optimize policy in the presence of prediction errors. By harnessing the partially known model, the convergence speed can be dramatically improved.
With space and aerial platforms deployed at different altitudes, integrated ground-air-space (IGAS) networks will have multiple vertical layers, hence forming a three-dimensional (3D) structure. These 3D IGAS networks integrating both aerial and space platforms into terrestrial communications constitute a promising architecture for building fully connected global next generation networks (NGNs). This article presents a systematic treatment of 3D networks from the perspective of multi-objective optimization. Given the inherent features of these 3D links, the resultant 3D networks are more complex than conventional terrestrial networks. To design 3D networks accommodating the diverse performance requirements of NGNs, this article provides a multi-objective optimization framework for 3D networks in terms of their diverse performance metrics. We conclude by identifying a range of future research challenges in designing 3D networks and by highlighting a suite of potential solutions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.