Q-Routing in Cognitive Packet Network Routing Protocol for MANETs

Alharbi, Amal; Al-Dhalaan, Abdullah; Al-Rodhaan, Miznah

doi:10.5220/0005082902340243

Cited by 4 publications

(4 citation statements)

References 11 publications

(11 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We can then re-estimate the values of Q π (s t , a t ) through either ( 13) or (14), and continue to improve the policy using (15). When the process converges, we reach an optimal policy π * (and its corresponding value functions Q π * (s t , a t )).…”

Section: Conventional Q-learningmentioning

confidence: 99%

“…When estimating Q (which is commonly referred to as policy evaluation), there are two common approaches, each with its own benefits and shortcomings: 1) Monte-Carlo Estimation, corresponding to (13), which provides a unbiased estimation, but has high variance; 2) Temporal Difference Estimation, corresponding to (14), which has low variance, but is biased due to its self-referential nature.…”

Section: Conventional Q-learningmentioning

confidence: 99%

“…To test the generalizability of the proposed method, we directly take the agent trained under the original setting and reuse it for a much larger ad-hoc network of F = 10 data flows in a 5000×5000m 2 region, with B = 32 available frequency bands. We place larger number of (19,16,21,18,14,24,17,20,19) nodes over nine evenly divided sub-regions. The sum-rate results are shown in Fig.…”

Section: Generalization Performancementioning

confidence: 99%

See 2 more Smart Citations

Scalable Deep Reinforcement Learning for Routing and Spectrum Access in Physical Layer

Cui

2021

IEEE Trans. Commun.

View full text Add to dashboard Cite

Section: Conventional Q-learningmentioning

confidence: 99%

Section: Conventional Q-learningmentioning

confidence: 99%

Section: Generalization Performancementioning

confidence: 99%

See 1 more Smart Citation

Scalable Deep Reinforcement Learning for Routing and Spectrum Access in Physical Layer

Cui

2021

IEEE Trans. Commun.

View full text Add to dashboard Cite

“…Thus, routing can be readily modeled as a Markov decision process [7] and naturally fits into the realm of reinforcement learning. In this direction, many previous works [8,9,10,11,12,13,14,15,16] have employed the classical Q-learning [17] algorithm to train agents to find the optimal route. In these works, a distinct agent is associated with each transmission node.…”

Section: Introductionmentioning

confidence: 99%

Scalable Reinforcement Learning For Routing In Ad-Hoc Networks Based On Physical-Layer Attributes

Cui

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

This work proposes a novel and scalable reinforcement learning approach for routing in ad-hoc wireless networks. In most previous reinforcement learning based routing methods, the links in the network are assumed to be fixed, and a different agent is trained for each transmission node -this limits scalability and generalizability. In this paper, we account for the inherent signal-to-interferenceplus-noise ratio (SINR) in the physical layer and propose a more scalable approach in which a single agent is associated with each flow and is trained using a novel reward definition and according to the physical-layer characteristics of the environment. This allows a highly effective routing strategy based on the geographic locations of the nodes in the ad-hoc network. The proposed deep reinforcement learning strategy is capable of accounting for the mutual interference between the links and is capable of producing highly effective routing solutions over the entire network in a scalable manner.

show abstract