2020
DOI: 10.1609/aaai.v34i02.5531
|View full text |Cite
|
Sign up to set email alerts
|

Deep Neural Network Approximated Dynamic Programming for Combinatorial Optimization

Abstract: In this paper, we propose a general framework for combining deep neural networks (DNNs) with dynamic programming to solve combinatorial optimization problems. For problems that can be broken into smaller subproblems and solved by dynamic programming, we train a set of neural networks to replace value or policy functions at each decision step. Two variants of the neural network approximated dynamic programming (NDP) methods are proposed; in the value-based NDP method, the networks learn to estimate the value of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(11 citation statements)
references
References 17 publications
0
9
0
Order By: Relevance
“…Then this heatmap is used for branching in forwarding Dynamic Programming. [82,83] uses neural networks to approximate the value functions for dynamic programming, which expedites the solution time. [84] a policy iteration algorithm to solve CVRP.…”
Section: A Learning Methods For Facilitating Non-learning Methodsmentioning
confidence: 99%
“…Then this heatmap is used for branching in forwarding Dynamic Programming. [82,83] uses neural networks to approximate the value functions for dynamic programming, which expedites the solution time. [84] a policy iteration algorithm to solve CVRP.…”
Section: A Learning Methods For Facilitating Non-learning Methodsmentioning
confidence: 99%
“…The authors used a solution reconstruction procedure that samples solutions, and a NN is trained to assess the quality of each solution. Xu et al [13] designed a model that integrates Neural Networks with Dynamic Program to solve various optimisation problems. The authors proposed two solutions, value and policy-based, which considerably reduce time with reasonable performance loss.…”
Section: Related Workmentioning
confidence: 99%
“…Naturally, one drawback that remains is that the quality of approximation depends on appropriately selecting the features as well as the functions for the approximation, which is not trivial. Hence, a NN can be used to approximate the value function, thereby replacing the step of feature and function selection (van Heeswijk and La Poutré 2019; Xu et al 2020).…”
Section: Literature Reviewmentioning
confidence: 99%
“…(Ryu et al 2019) propose a Q-learning framework to optimize over continuous action spaces using a combination of MP and a DNN actor. (Delarue, Anderson, and Tjandraatmadja 2020;van Heeswijk and La Poutré 2019;Xu et al 2020) show how to use ReLU-based DNN value functions to optimize combinatorial problems (e.g., vehicle routing) where the immediate rewards are deterministic and the action space is vast. We extend such approaches and results to problems where the immediate reward can be uncertain as is the case with inventory management problems.…”
Section: Literature Reviewmentioning
confidence: 99%