Deep Neural Network Approximated Dynamic Programming for Combinatorial Optimization

Xu, Shenghe; Panwar, Shivendra S.; Kodialam, Murali; Lakshman, T. V.

doi:10.1609/aaai.v34i02.5531

Cited by 23 publications

(11 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then this heatmap is used for branching in forwarding Dynamic Programming. [82,83] uses neural networks to approximate the value functions for dynamic programming, which expedites the solution time. [84] a policy iteration algorithm to solve CVRP.…”

Section: A Learning Methods For Facilitating Non-learning Methodsmentioning

confidence: 99%

Learning to Solve Vehicle Routing Problems: A Survey

Bogyrbayeva¹,

Meraliyev²,

Mustakhov³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper provides a systematic overview of machine learning methods applied to solve NP-hard Vehicle Routing Problems (VRPs). Recently, there has been a great interest from both machine learning and operations research communities to solve VRPs either by pure learning methods or by combining them with the traditional hand-crafted heuristics. We present the taxonomy of the studies for learning paradigms, solution structures, underlying models, and algorithms. We present in detail the results of the state-of-the-art methods demonstrating their competitiveness with the traditional methods. The paper outlines the future research directions to incorporate learning-based solutions to overcome the challenges of modern transportation systems.

show abstract

Section: A Learning Methods For Facilitating Non-learning Methodsmentioning

confidence: 99%

Learning to Solve Vehicle Routing Problems: A Survey

Bogyrbayeva¹,

Meraliyev²,

Mustakhov³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The authors used a solution reconstruction procedure that samples solutions, and a NN is trained to assess the quality of each solution. Xu et al [13] designed a model that integrates Neural Networks with Dynamic Program to solve various optimisation problems. The authors proposed two solutions, value and policy-based, which considerably reduce time with reasonable performance loss.…”

Section: Related Workmentioning

confidence: 99%

Defending Active Directory by Combining Neural Network based Dynamic Program and Evolutionary Diversity Optimisation

Goel¹,

Ward²,

Neumann³

et al. 2022

Preprint

View full text Add to dashboard Cite

Active Directory (AD) is the default security management system for Windows domain networks. We study a Stackelberg game model between one attacker and one defender on an AD attack graph. The attacker initially has access to a set of entry nodes. The attacker can expand this set by strategically exploring edges. Every edge has a detection rate and a failure rate. The attacker aims to maximize their chance of successfully reaching the destination before getting detected. The defender's task is to block a constant number of edges to decrease the attacker's chance of success. We show that the problem is #P-hard and, therefore, intractable to solve exactly. We convert the attacker's problem to an exponential sized Dynamic Program that is approximated by a Neural Network (NN). Once trained, the NN provides an efficient fitness function for the defender's Evolutionary Diversity Optimisation (EDO). The diversity emphasis on the defender's solution provides a diverse set of training samples, which improves the training accuracy of our NN for modelling the attacker. We go back and forth between NN training and EDO. Experimental results show that for R500 graph, our proposed EDO based defense is less than 1% away from the optimal defense.

show abstract

“…Naturally, one drawback that remains is that the quality of approximation depends on appropriately selecting the features as well as the functions for the approximation, which is not trivial. Hence, a NN can be used to approximate the value function, thereby replacing the step of feature and function selection (van Heeswijk and La Poutré 2019; Xu et al 2020).…”

Section: Literature Reviewmentioning

confidence: 99%

“…(Ryu et al 2019) propose a Q-learning framework to optimize over continuous action spaces using a combination of MP and a DNN actor. (Delarue, Anderson, and Tjandraatmadja 2020;van Heeswijk and La Poutré 2019;Xu et al 2020) show how to use ReLU-based DNN value functions to optimize combinatorial problems (e.g., vehicle routing) where the immediate rewards are deterministic and the action space is vast. We extend such approaches and results to problems where the immediate reward can be uncertain as is the case with inventory management problems.…”

Section: Literature Reviewmentioning

confidence: 99%

Deep Policy Iteration with Integer Programming for Inventory Management

Harsha¹,

Jagmohan²,

Kalagnanam³

et al. 2021

Preprint

View full text Add to dashboard Cite

Reinforcement learning has lead to considerable breakthroughs in diverse areas such as robotics, games and many others. But the application to RL in complex real-world decision making problems remains limited. Many problems in operations management (inventory and revenue management, for example) are characterized by large action spaces and stochastic system dynamics. These characteristics make the problem considerably harder to solve for existing RL methods that rely on enumeration techniques to solve per step action problems. To resolve these issues, we develop Programmable Actor Reinforcement Learning (PARL), a policy iteration method that uses techniques from integer programming and sample average approximation. Analytically, we show that the for a given critic, the learned policy in each iteration converges to the optimal policy as the underlying samples of the uncertainty go to infinity. Practically, we show that a properly selected discretization of the underlying uncertain distribution can yield near optimal actor policy even with very few samples from the underlying uncertainty. We then apply our algorithm to real-world inventory management problems with complex supply chain structures and show that PARL outperforms state-of-the-art RL and inventory optimization methods in these settings. We find that PARL outperforms commonly used base stock heuristic by 44.7% and the best performing RL method by up to 12.1% on average across different supply chain environments.

show abstract

Deep Neural Network Approximated Dynamic Programming for Combinatorial Optimization

Cited by 23 publications

References 17 publications

Learning to Solve Vehicle Routing Problems: A Survey

Learning to Solve Vehicle Routing Problems: A Survey

Defending Active Directory by Combining Neural Network based Dynamic Program and Evolutionary Diversity Optimisation

Deep Policy Iteration with Integer Programming for Inventory Management

Contact Info

Product

Resources

About