2022
DOI: 10.1109/twc.2021.3104633
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Agent Reinforcement Learning in NOMA-Aided UAV Networks for Cellular Offloading

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 53 publications
(13 citation statements)
references
References 53 publications
0
10
0
Order By: Relevance
“…In the work [124], the authors suggested multi-agent reinforcement learning (MARL) algorithms, in which two UAVs decide the goal helper and bandwidth allocation, to separate the assessment of the integrated decision. Similarly, to maximize the long-term utility of the proposed UAV-enabled MEC network, the aim of the optimization problem in [125] is to find the best computation offloading and resource management policies. The problem is further formulated as a semi-Markov method (SMDP), and deep RL based algorithms are proposed in both centralized and distributed UAV-enabled MEC networks, taking into account random system demands and time-varying communication channels.…”
Section: Markov Decision Process and Reinforcement Learningmentioning
confidence: 99%
“…In the work [124], the authors suggested multi-agent reinforcement learning (MARL) algorithms, in which two UAVs decide the goal helper and bandwidth allocation, to separate the assessment of the integrated decision. Similarly, to maximize the long-term utility of the proposed UAV-enabled MEC network, the aim of the optimization problem in [125] is to find the best computation offloading and resource management policies. The problem is further formulated as a semi-Markov method (SMDP), and deep RL based algorithms are proposed in both centralized and distributed UAV-enabled MEC networks, taking into account random system demands and time-varying communication channels.…”
Section: Markov Decision Process and Reinforcement Learningmentioning
confidence: 99%
“…The authors of [202] jointly optimized the three-dimensional trajectory of multiple UAVs and the power allocation policy to maximize the total throughput in a UAV-BS enabled NOMA network. A Mutual Deep Q-network (MDQN) algorithm was proposed to solve the formulated problem.…”
Section: Bnbmentioning
confidence: 99%
“…Unsupervised learning algorithms, such as the principal component analysis (PCA) algorithm [214] and the K-means algorithm [215]- [217], have been used for user clustering in NOMA networks. NOMA S-IoT DL-PA Network utility Long-term power allocation [188] MIMO-NOMA CDNN EE Several convolutional layers and multiple hidden layers [189], [190] Multi-user NOMA DNN Sum rate Imperfect successive interference cancellation [191] NOMA-D2D DNN Sum rate Channel coefficients, power budget, and binary user pairing variable serve as DNN input [192] Multi-user NOMA DNN Transmission delay Dynamic power control [193] NOMA-SWIPT DBN Rate and harvested energy Comprises three phases: Preparing data samples, training, and running [197] Multi-user NOMA Hotbooting Q-learning Sum rate Without relying on the knowledge of the jamming and radio channel parameters [198] Grant-free NOMA LSTM-based DQN Throughput The long-term cluster throughput maximization problem is formulated as a POMDP problem [199] Hybrid NOMA Actor-critic Sum rate The proposed approach includes recurrent neural network (RNN) and proximal policy optimization (PPO) [200] MC-NOMA DDPG-based DRL-JRM Weighted sum rate A novel centralized action-value function is designed to measure the reward [201] NOMA-UAV CDRL Network capacity The proposed algorithm is based on PPO algorithm [202] NOMA-UAV MDQN Throughput…”
Section: ) Other Machine Learning Techniques For Nomamentioning
confidence: 99%
“…The previous works related to our paper include those focusing on wireless powered MEC networks [1]- [5], [10]- [14], and UAV-assisted communication networks for mobile terminals [15]- [17].…”
Section: Related Workmentioning
confidence: 99%
“…Ref. [17] discretized the flight direction of UAV and the transmit power of terminals and devised a value-based DRL algorithm. Since this algorithm has to search the action space exhaustively in each iteration, it cannot be used for problems with high-dimensional or continuous actions [18].…”
Section: B Uav-assisted Communication Network For Mobile Terminalsmentioning
confidence: 99%