2021
DOI: 10.1007/s42979-021-00779-2
|View full text |Cite
|
Sign up to set email alerts
|

Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Abstract: Recent works using deep learning to solve routing problems such as the traveling salesman problem (TSP) have focused on learning construction heuristics. Such approaches find good quality solutions but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heurist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 60 publications
(45 citation statements)
references
References 23 publications
0
32
0
Order By: Relevance
“…For the Knapsack and Bin Packing problems, we compare running times against OR-Tools, as shown in the appendix, Table 4. Neural SA lags behind OR-Tools in the Knapsack, (Wu et al, 2019a;da Costa et al, 2020;Fu et al, 2021). PPO vs ES Neural SA can be trained with any policy optimisation method making it highly extendable.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For the Knapsack and Bin Packing problems, we compare running times against OR-Tools, as shown in the appendix, Table 4. Neural SA lags behind OR-Tools in the Knapsack, (Wu et al, 2019a;da Costa et al, 2020;Fu et al, 2021). PPO vs ES Neural SA can be trained with any policy optimisation method making it highly extendable.…”
Section: Discussionmentioning
confidence: 99%
“…These work by brute force learning the instance to solution mapping-in CO these are sometimes referred to as construction heuristics. Other works focus on learning good parameters for classic algorithms, whether they be parameters of the original algorithm (Kruber et al, 2017;Bonami et al, 2018) or extra neural parameters introduced into the computational graph of classic algorithms (Gasse et al, 2019;Gupta et al, 2020;Kool et al, 2021;da Costa et al, 2020;Wu et al, 2019b;Chen & Tian, 2019;Fu et al, 2021). Our method, neural simulated annealing (Neural SA) can be viewed as sitting firmly within this last category.…”
Section: Introductionmentioning
confidence: 99%
“…Solving single-agent routing (scheduling) problems with RL. According to [26], the RL approaches to solving agent routing problems can be categorized into: (1) improvement heuristics learns to rewrite the complete solution iteratively to obtain a better solution [43,5,4,24]; (2) construction approach learns to construct a solution by sequentially assigning idle agents to unvisited cities until the full routing schedule (sequence) is constructed [3,28,20,19], and (3) hybrid approaches blending both approaches [17,7,21,1]. Typically, learning-based improvement or hybrid approaches have shown good performance since these can iteratively update the best solution until reaching the best one.…”
Section: Related Workmentioning
confidence: 99%
“…In [20], they extended network consideration using a reinforce method with a greedy rollout baseline. In other recent works, the authors of [21] propose a Deep Reinforcement Learning algorithm trained using Policy Gradient to learn improvement heuristics based on 2-opt moves for the TSP and in [22] they use a hybrid of Deep Reinforcement Learning and local search for the VRP.…”
Section: B Deep Reinforcement Learning Applications In Decision Makin...mentioning
confidence: 99%