Learning Improvement Heuristics for Solving Routing Problems

Wu, Yaoxin; Song, Wen; Cao, Zhiguang; Zhang, Jie; Lim, Andrew

doi:10.48550/arxiv.1912.05784

Cited by 17 publications

(18 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most constructive neural methods are auto-regressive, predicting the next node given the partial tour constructed, but other works have considered predicting a 'heatmap' of promising edges at once (Nowak et al, 2017;Joshi et al, 2019a;Fu et al, 2020), which allows a tour to be constructed (using sampling or beam search) without further evaluating the model. Whereas these are constructive approaches, others have reported results with 'learning to search', where a neural network is used to guide a search procedure such as local search (Chen & Tian, 2019;Lu et al, 2020;Gao et al, 2020;Wu et al, 2019;Hottung & Tierney, 2019). While most researches have focused on instances up to 100 nodes, some have attempted scaling to larger instances, which remains challenging (Ma et al, 2019;Fu et al, 2020).…”

Section: Machine Learning For Vehicle Routing Problemsmentioning

confidence: 99%

Deep Policy Dynamic Programming for Vehicle Routing Problems

Kool¹,

Hoof²,

Gromicho³

et al. 2021

Preprint

View full text Add to dashboard Cite

Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms can find optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims to combine the strengths of learned neural heuristics with those of DP algorithms. DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming other 'neural approaches' for solving TSPs and VRPs with 100 nodes.

show abstract

Section: Machine Learning For Vehicle Routing Problemsmentioning

confidence: 99%

Deep Policy Dynamic Programming for Vehicle Routing Problems

Kool¹,

Hoof²,

Gromicho³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Chen et al [29] proposed a DRL-based local search framework, termed NeuRewriter, that shows a promising performance on CVRP and job scheduling problems. Wu et al [30], and Costa et al [31] proposed a DRL-based TSP solver by learning the 2-opt. Their method improves the randomly generated solutions, unlike the method of Chen et al [29] rewrites a solution given by a conventional heuristic solver.…”

Section: Drl-based Improvement Heuristicsmentioning

confidence: 99%

“…We follow baseline setting of Kool et al [12] and Costa et al [19]. We set DRL baselines including the S2V-DQN [11], EAN [23], GAT-T [30], DRL-2opt [19], and AM [12]. We show the results of S2V-DQN and EAN reported by Kool et al [12], and the results of GAT-T reported by Costa et al [19].…”

Section: Capacitated Vehicle Routing Problem (Cvrp)mentioning

confidence: 99%

Learning Collaborative Policies to Solve NP-hard Routing Problems

Kim¹,

Park²,

kim³

2021

Preprint

View full text Add to dashboard Cite

Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions. On the other hand, the reviser modifies each candidate solution generated by the seeder; it partitions the full trajectory into sub-tours and simultaneously revises each sub-tour to minimize its traveling distance. Thus, the reviser is trained to improve the candidate solution's quality, focusing on the reduced solution space (which is beneficial for exploitation). Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP).

show abstract

“…This process can then be emulated to generate (possibly smallersized) more representative CVRP instances, as it was done in Kool, Hoof, Gromicho, et al (2021) and in Hottung, Kwon, and Tierney (2021) (appendix). In Wu et al (2020), the authors directly used a subset of X instances along with distributions more commonly used in other ML works.…”

Section: Benchmark Instances and Problem Definitionmentioning

confidence: 99%

“…Currently, most of the proposed approaches aim at learning constructive heuristics, which sequentially extend a partial solution, possibly employing additional procedures such as sampling and beam search (see e.g., Bello et al (2017) and Hottung, Kwon, and Tierney (2021)). Few others, such as Wu et al (2020) and Chen and Tian (2019), instead, focus on learning improvement heuristics to guide the exploration of the search space and iteratively refine an existing solution.…”

Section: Introductionmentioning

confidence: 99%

Guidelines for the Computational Testing of Machine Learning approaches to Vehicle Routing Problems

Accorsi¹,

Lodi²,

Vigo³

2021

Preprint

View full text Add to dashboard Cite

Despite the extensive research efforts and the remarkable results obtained on Vehicle Routing Problems (VRP) by using algorithms proposed by the Machine Learning community that are partially or entirely based on data-driven analysis, most of these approaches are still seldom employed by the Operations Research (OR) community. Among the possible causes, we believe, the different approach to the computational evaluation of the proposed methods may play a major role. With the current work, we want to highlight a number of challenges (and possible ways to handle them) arising during the computational studies of heuristic approaches to VRPs that, if appropriately addressed, may produce a computational study having the characteristics of those presented in OR papers, thus hopefully promoting the collaboration between the two communities.

show abstract

Learning Improvement Heuristics for Solving Routing Problems

Cited by 17 publications

References 30 publications

Deep Policy Dynamic Programming for Vehicle Routing Problems

Deep Policy Dynamic Programming for Vehicle Routing Problems

Learning Collaborative Policies to Solve NP-hard Routing Problems

Guidelines for the Computational Testing of Machine Learning approaches to Vehicle Routing Problems

Contact Info

Product

Resources

About