Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Costa, Paulo Roberto de Oliveira da; Rhuggenaath, Jason; Zhang, Yingqian; Akçay, Alp; Kaymak, Uzay

doi:10.1007/s42979-021-00779-2

Cited by 60 publications

(45 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the Knapsack and Bin Packing problems, we compare running times against OR-Tools, as shown in the appendix, Table 4. Neural SA lags behind OR-Tools in the Knapsack, (Wu et al, 2019a;da Costa et al, 2020;Fu et al, 2021). PPO vs ES Neural SA can be trained with any policy optimisation method making it highly extendable.…”

Section: Discussionmentioning

confidence: 99%

“…These work by brute force learning the instance to solution mapping-in CO these are sometimes referred to as construction heuristics. Other works focus on learning good parameters for classic algorithms, whether they be parameters of the original algorithm (Kruber et al, 2017;Bonami et al, 2018) or extra neural parameters introduced into the computational graph of classic algorithms (Gasse et al, 2019;Gupta et al, 2020;Kool et al, 2021;da Costa et al, 2020;Wu et al, 2019b;Chen & Tian, 2019;Fu et al, 2021). Our method, neural simulated annealing (Neural SA) can be viewed as sitting firmly within this last category.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Neural Simulated Annealing

Correia¹,

Worrall²,

Bondesan³

2022

Preprint

View full text Add to dashboard Cite

Simulated annealing (SA) is a stochastic global optimisation technique applicable to a wide range of discrete and continuous variable problems. Despite its simplicity, the development of an effective SA optimiser for a given problem hinges on a handful of carefully handpicked components; namely, neighbour proposal distribution and temperature annealing schedule. In this work, we view SA from a reinforcement learning perspective and frame the proposal distribution as a policy, which can be optimised for higher solution quality given a fixed computational budget. We demonstrate that this Neural SA with such a learnt proposal distribution, parametrised by small equivariant neural networks, outperforms SA baselines on a number of problems: Rosenbrock's function, the Knapsack problem, the Bin Packing problem, and the Travelling Salesperson problem. We also show that Neural SA scales well to large problems-generalising to significantly larger problems than the ones seen during training-while achieving comparable performance to popular off-the-shelf solvers and other machine learning methods in terms of solution quality and wall-clock time.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Neural Simulated Annealing

Correia¹,

Worrall²,

Bondesan³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Solving single-agent routing (scheduling) problems with RL. According to [26], the RL approaches to solving agent routing problems can be categorized into: (1) improvement heuristics learns to rewrite the complete solution iteratively to obtain a better solution [43,5,4,24]; (2) construction approach learns to construct a solution by sequentially assigning idle agents to unvisited cities until the full routing schedule (sequence) is constructed [3,28,20,19], and (3) hybrid approaches blending both approaches [17,7,21,1]. Typically, learning-based improvement or hybrid approaches have shown good performance since these can iteratively update the best solution until reaching the best one.…”

Section: Related Workmentioning

confidence: 99%

ScheduleNet: Learn to solve multi-agent scheduling problems with reinforcement learning

Park,

Bakhtiyar,

Park

2021

Preprint

View full text Add to dashboard Cite

We propose ScheduleNet, a RL-based real-time scheduler, that can solve various types of multi-agent scheduling problems. We formulate these problems as a semi-MDP with episodic reward (makespan) and learn ScheduleNet, a decentralized decision-making policy that can effectively coordinate multiple agents to complete tasks. The decision making procedure of ScheduleNet includes: (1) representing the state of a scheduling problem with the agent-task graph, (2) extracting node embeddings for agent and tasks nodes, the important relational information among agents and tasks, by employing the type-aware graph attention (TGA), and (3) computing the assignment probability with the computed node embeddings. We validate the effectiveness of ScheduleNet as a general learning-based scheduler for solving various types of multi-agent scheduling tasks, including multiple salesman traveling problem (mTSP) and job shop scheduling problem (JSP).Preprint. Under review.

show abstract

“…In [20], they extended network consideration using a reinforce method with a greedy rollout baseline. In other recent works, the authors of [21] propose a Deep Reinforcement Learning algorithm trained using Policy Gradient to learn improvement heuristics based on 2-opt moves for the TSP and in [22] they use a hybrid of Deep Reinforcement Learning and local search for the VRP.…”

Section: B Deep Reinforcement Learning Applications In Decision Makin...mentioning

confidence: 99%

Online Multimodal Transportation Planning using Deep Reinforcement Learning

Farahani¹,

Genga²,

Dijkman³

2021

Preprint

View full text Add to dashboard Cite

In this paper we propose a Deep Reinforcement Learning approach to solve a multimodal transportation planning problem, in which containers must be assigned to a truck or to trains that will transport them to their destination. While traditional planning methods work "offline" (i.e., they take decisions for a batch of containers before the transportation starts), the proposed approach is "online", in that it can take decisions for individual containers, while transportation is being executed. Planning transportation online helps to effectively respond to unforeseen events that may affect the original transportation plan, thus supporting companies in lowering transportation costs. We implemented different container selection heuristics within the proposed Deep Reinforcement Learning algorithm and we evaluated its performance for each heuristic using data that simulate a realistic scenario, designed on the basis of a real case study at a logistics company. The experimental results revealed that the proposed method was able to learn effective patterns of container assignment. It outperformed tested competitors in terms of total transportation costs and utilization of train capacity by 20.48% to 55.32% for the cost and by 7.51% to 20.54% for the capacity. Furthermore, it obtained results within 2.7% for the cost and 0.72% for the capacity of the optimal solution generated by an Integer Linear Programming solver in an offline setting.

show abstract

Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Cited by 60 publications

References 23 publications

Neural Simulated Annealing

Neural Simulated Annealing

ScheduleNet: Learn to solve multi-agent scheduling problems with reinforcement learning

Online Multimodal Transportation Planning using Deep Reinforcement Learning

Contact Info

Product

Resources

About