Throughput-Aware Cooperative Reinforcement Learning for Adaptive Resource Allocation in Device-to-Device Communication

Khan, Muhidul Islam; Alam, Muhammad Mahtab; Moullec, Yannick Le; Yaacoub, Elias

doi:10.3390/fi9040072

Cited by 26 publications

(23 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The convergence property of the actor‐critic method is much better than critic‐only method . Critic‐only methods like Q‐learning and SARSA utilize a state‐action value function. Also, these methods do not have an explicit function for the estimation of the policy.…”

Section: Introductionmentioning

confidence: 99%

An efficient actor‐critic reinforcement learning for device‐to‐device communication underlaying sectored cellular network

Khuntia

Hazra

Chong

2020

Int J Communication

View full text Add to dashboard Cite

In this paper, a novel reinforcement learning (RL) approach with cell sectoring is proposed to solve the channel and power allocation issue for a device-to-device (D2D)-enabled cellular network when the prior traffic information is not known to the base station (BS). Further, this paper explores an optimal policy for resource and power allocation between users intending to maximize the sum-rate of the overall system. Since the behavior of wireless channel and traffic request of users in the system is stochastic in nature, the dynamic property of the environment allows us to employ an actor-critic RL technique to learn the best policy through continuous interaction with the surrounding. The proposed work comprises of four phases: cell splitting, clustering, queuing model, and channel allocation and power allocation simultaneously using an actor-critic RL.The implementation of cell splitting with novel clustering technique increases the network coverage, reduces co-channel cell interference, and minimizes the transmission power of nodes, whereas the queuing model solves the issue of waiting time for users in a priority-based data transmission. With the help of continuous state-action space, the actor-critic RL algorithm based on policy gradient improves the overall system sum-rate as well as the D2D throughput. The actor adopts a parameter-based stochastic policy for giving continuous action while the critic estimates the policy and criticizes the actor for the action. This reduces the high variance of the policy gradient. Through numerical simulations, the benefit of our resource sharing scheme over other existing traditional scheme is verified. KEYWORDS actor-critic reinforcement learning, cell sectoring, device-to-device communication, k-means clustering, queuing model, resource allocation Int J Commun Syst. 2020;33:e4315. wileyonlinelibrary.com/journal/dac KHUNTIA ET AL.D2D communication allows direct communication between two users in close proximity, without the involvement of the base station (BS). In an underlaying cellular network, D2D users (D2Ds) reuse radio resources allocated to cellular users (CUs). The reuse of resources of a cellular user (CU) by D2D user causes interference with each other. 1,2 Therefore, the selection of a suitable resource and power allocation scheme plays a vital role in reducing interference. So, in order to reduce interference, a suitable amount of transmission power must be chosen for each CU and D2D. Thus, as a central entity, BS determines the transmission power of each user and interference level using various scheduling algorithms. 3 But, the traditional method of resource allocation does not provide a preferable optimal outcome, if the complete traffic information is not known to the BS, a priori. There are various conventional D2D schemes, which aims at maximizing the network throughput. Some of the schemes are graph-based method, fractional frequency reuse, 4 Lagrange multiplier, 5 and optimization algorithm, eg, particle swarm optimization and genetic algorithm (GA). ...

show abstract

Section: Introductionmentioning

confidence: 99%

An efficient actor‐critic reinforcement learning for device‐to‐device communication underlaying sectored cellular network

Khuntia

Hazra

Chong

2020

Int J Communication

View full text Add to dashboard Cite

show abstract

“…The result is increased performance in terms of data rate, robustness, delay, security, and energy consumption. The cooperative network coding then extended to the device to device (D2D) communication, which further increases the overall capacity and throughput of the network [2][3][4]. The utilization of D2D and cellular network in network coding decreases the packet recovery time, and meets the performance of cellular via network coding [5].…”

Section: Introductionmentioning

confidence: 99%

Cooperative Admission Control with Network Coding in 5G Underlying D2D-Satellite Communication

Awan

Zhao

et al. 2020

Electronics

View full text Add to dashboard Cite

Cooperative communication supported by device to device (D2D)-LEO earthed satellite increases the performance of the resilient network and offloads base station. Additionally, network coding in a packet-based cooperative framework provides diversity and speedy recovery of lost packets. Cooperative communication advantages are subject to effective joint admission control strengthened by network coding for multiple interfaces. Joint admission control with network coding involves multiple constraints in terms of user selection, mode assignment, power allocation, and interface-based network codewords, which is challenging to solve collectively. Sub-problematization and its heuristic solution lead to a less complex solution. First, the adaptive terrestrial satellite power sentient network (ATSPSN) algorithm is proposed based on low complex convex linearization of mix integer non-linear problem (MINLP), NP-hard. ATSPSN provides optimum power allocation, mode assignment, and user selection based on joint channel conditions. Second, a multiple access network coding algorithm (MANC) is developed underlying the D2D-satellite network, which provides novel multiple interface random linear network codewords. At the end, the bi-directional matching algorithm aiming for joint admission control with network coding, named JAMANC-stream and JAMANC-batch communication, is proposed. JAMANC algorithm leads to a less complex solution and provides improved results in terms of capacity, power efficiency, and packet completion time. The theoretical lower and upper bounds are also derived for comparative study.

show abstract

“…The D2D communication is a gifted component for 5G because of its two innate advantages, ie, traffic off-loading and radio resource reusing 7 capabilities. 12 The other domains of reinforcement learning for D2D communication includes deep learning for data transmission, 13 adaptive power allocation, 14 and access control and management. 8 With 5G and D2D, the focus is on indoor communications, particularly its coverage and connectivity issues.…”

Section: Introductionmentioning

confidence: 99%

Reinforcement learning algorithm for 5G indoor device‐to‐device communications

Sreedevi

Rao

2019

Trans Emerging Tel Tech

View full text Add to dashboard Cite

Fifth generation (5G), the next generation telecommunications will be striking the markets in near future. Device‐to‐device (D2D) communication would be a part of 5G to serve communication needs for billions of connected devices to support high data rate ultrareliable low latency communications. Indoor 5G will be relying on distributed small cell solutions and D2D along with machine‐to‐machine connections. Machine learning is one of the most promising tools for providing the best set of solutions to learn the influential scenarios and certain parameters of the communication networks. This research proposes reinforcement‐learning‐based latency controlled D2D connectivity (RL‐LCDC) algorithm and its Q‐learning approach in an indoor D2D communication network for strong 5G connectivity with minimum latency. The proposed approach, RL‐LCDC efficiently discovers the neighbors, decides the D2D link, and adaptively controls the communication range for maximum network connectivity. The results show that RL‐LCDC optimizes the connectivity with lower end‐to‐end delay and better energy efficiency with efficient convergence time when compared with other conventional schemes.

show abstract

Throughput-Aware Cooperative Reinforcement Learning for Adaptive Resource Allocation in Device-to-Device Communication

Cited by 26 publications

References 29 publications

An efficient actor‐critic reinforcement learning for device‐to‐device communication underlaying sectored cellular network

An efficient actor‐critic reinforcement learning for device‐to‐device communication underlaying sectored cellular network

Cooperative Admission Control with Network Coding in 5G Underlying D2D-Satellite Communication

Reinforcement learning algorithm for 5G indoor device‐to‐device communications

Contact Info

Product

Resources

About