Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

Cappart, Quentin; Moisan, Thierry; Rousseau, Louis-Martin; Prémont‐Schwarz, Isabeau; Ciré, André A.

doi:10.48550/arxiv.2006.01610

Cited by 8 publications

(9 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While most researches have focused on instances up to 100 nodes, some have attempted scaling to larger instances, which remains challenging (Ma et al, 2019;Fu et al, 2020). Related to our approach, Cappart et al (2020) propose to combine reinforcement learning, constraint programming and dynamic programming and experiment with the TSP with time windows. For surveys of machine learning for routing problems and combinatorial optimization in general, we refer to Mazyavkina et al (2020); Vesselinova et al (2020).…”

Section: Machine Learning For Vehicle Routing Problemsmentioning

confidence: 99%

Deep Policy Dynamic Programming for Vehicle Routing Problems

Kool¹,

Hoof²,

Gromicho³

et al. 2021

Preprint

View full text Add to dashboard Cite

Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms can find optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims to combine the strengths of learned neural heuristics with those of DP algorithms. DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming other 'neural approaches' for solving TSPs and VRPs with 100 nodes.

show abstract

Section: Machine Learning For Vehicle Routing Problemsmentioning

confidence: 99%

Deep Policy Dynamic Programming for Vehicle Routing Problems

Kool¹,

Hoof²,

Gromicho³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The references [28,29] provide exact solutions to the problem using constraint programming. A more recent study combining constraint programming and reinforcement learning is presented in [30].…”

Section: Tsp Problem and Its Variantsmentioning

confidence: 99%

Unconstrained Binary Models of the Travelling Salesman Problem Variants for Quantum Optimization

Salehi,

Glos,

Miszczak

2021

Preprint

View full text Add to dashboard Cite

Quantum computing is offering a novel perspective for solving combinatorial optimization problems. To fully explore the possibilities offered by quantum computers, the problems need to be formulated as unconstrained binary models, taking into account limitation and advantages of quantum devices. In this work, we provide a detailed analysis of the Travelling Salesman Problem with Time Windows (TSPTW) in the context of solving it on a quantum computer. We introduce quadratic unconstrained binary optimization and higher order binary optimization formulations of this problem. We demonstrate the advantages of edgebased and node-based formulations of the TSPTW problem. Additionally, we investigate the experimental realization of the presented methods on a quantum annealing device. The provided results pave the path for utilizing quantum computer for a variety of real-world task which can be cast in the form of Travelling Salesman Problem with Time Windows problem.

show abstract

“…Pointer network [66] Supervised, Approximation GCN + Search [31] Supervised, Approximation Q-Learning + GNN [19] Model-free, Value-based Hierarchical RL + GAT [44] Model-free, Policy-based REINFORCE + LSTM with attention [47] Model-free, Policy-based REINFORCE + attention [20] Model-free, Policy-based RL + GAT [36] Model-free, Policy-based DDPG [23] Model-free, Policy-based REINFORCE + Pointer network [10] Model-free, Policy-based RL + NN [45] Model-free, Actor-Critic RL + GAT [14] Model-free, Actor-Critic AlphaZero: MCTS + GCN [51] Model-based, Given model Knapsack Problem REINFORCE + Pointer network [10] Model-free, Policy-based Bin Packing Problem (BPP) REINFORCE + LSTM [29] Model-free, Policy-based AlphaZero: MCTS + NN [38] Model-based, Given model Job Scheduling Problem (JSP) RL + LSTM [16] Model-free, Actor-Critic Vehicle Routing Problem (VRP) REINFORCE + LSTM with attention [47] Model-free, Policy-based RL + LSTM [16] Model-free, Policy-based RL + GAT [36] Model-free, Policy-based RL + NN [43] Model-free, Policy-based RL + GAT [25] Model-free, Actor-Critic Global Routing DQN + MLP [40] Model-free, Value-based Highest Safe Rung (HSR) AlphaZero: MCTS + CNN [71] Model-based, Given model Table 3: Classification of ML approaches for NP-hard combinatorial optimization by problem, method, and type.…”

Section: Np-hard Problemmentioning

confidence: 99%

Learning to Solve Combinatorial Optimization Problems on Real-World Graphs in Linear Time

Drori¹,

Kharkar²,

Sickinger³

et al. 2020

Preprint

View full text Add to dashboard Cite

Combinatorial optimization algorithms for graph problems are usually designed afresh for each new problem with careful attention by an expert to the problem structure. In this work, we develop a new framework to solve any combinatorial optimization problem over graphs that can be formulated as a single player game defined by states, actions, and rewards, including minimum spanning tree, shortest paths, traveling salesman problem, and vehicle routing problem, without expert knowledge. Our method trains a graph neural network using reinforcement learning on an unlabeled training set of graphs. The trained network then outputs approximate solutions to new graph instances in linear running time. In contrast, previous approximation algorithms or heuristics tailored to NP-hard problems on graphs generally have at least quadratic running time. We demonstrate the applicability of our approach on both polynomial and NP-hard problems with optimality gaps close to 1, and show that our method is able to generalize well: (i) from training on small graphs to testing on large graphs; (ii) from training on random graphs of one type to testing on random graphs of another type; and (iii) from training on random graphs to running on real world graphs.Preprint. Under review.

show abstract

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

Cited by 8 publications

References 34 publications

Deep Policy Dynamic Programming for Vehicle Routing Problems

Deep Policy Dynamic Programming for Vehicle Routing Problems

Unconstrained Binary Models of the Travelling Salesman Problem Variants for Quantum Optimization

Learning to Solve Combinatorial Optimization Problems on Real-World Graphs in Linear Time

Contact Info

Product

Resources

About