2020
DOI: 10.1007/978-3-030-38085-4_40
|View full text |Cite
|
Sign up to set email alerts
|

Self-learning Routing for Optical Networks

Abstract: It is generally very difficult to optimize the routing policies in optical networks with dynamic traffic. Most widely-used routing policies, e.g., shortest path routing and least congested path (LCP) routing, are heuristic policies. Although the LCP is often regarded as the bestperforming adaptive routing policy, we are often eager to know whether there exist better routing policies that surpass these heuristics in performance. In this paper, we propose a framework of reinforcement learning (RL) based routing … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…There are some other works exploring various aspects by applying DRL in the optical network management. Huang et al [15] proposed a DRL-based self-learning routing scheme for the WDM-based networks. It allows the agent to continuously improve its performance by self-comparison.…”
Section: Deep Reinforcement Learning In Rmsa Of Eonsmentioning
confidence: 99%
See 2 more Smart Citations
“…There are some other works exploring various aspects by applying DRL in the optical network management. Huang et al [15] proposed a DRL-based self-learning routing scheme for the WDM-based networks. It allows the agent to continuously improve its performance by self-comparison.…”
Section: Deep Reinforcement Learning In Rmsa Of Eonsmentioning
confidence: 99%
“…The value network outputs a value V(s t ; θ v,T ), which is a real number. Finally, we store the sample (s t , a t , r t , V(s t ; θ v,T )) generated by the interaction of the agent and the environment in an experience buffer D. When the size of experience buffer reaches 2N − 1, we perform training based on the first N samples (lines [13][14][15][16][17][18][19]. For each sample at time t, the advantage function is calculated in line 15.…”
Section: Teacher Modelmentioning
confidence: 99%
See 1 more Smart Citation