Reinforcement Learning for Variable Selection in a Branch and Bound Algorithm

Ethève, Marc; Alès, Zacharie; Bissuel, Côme; Juan, Olivier; Kedad-Sidhoum, Safia

doi:10.1007/978-3-030-58942-4_12

Cited by 23 publications

(19 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this framework, RL methods are used to leverage the power of solvers or problemspecific solution heuristics by initializing values of some hyper-parameters. For example, RL can be utilized to select the branching variable in MIP solvers (Etheve et al 2020;Hottung et al 2020;Tang et al 2020). Some recent studies of Ma et al (2019); Deudon et al (2018); Chen and Tian (2019) show that optimization heuristics powered with RL methods outperform previous methods.…”

Section: Literature Reviewmentioning

confidence: 99%

A deep reinforcement learning assisted simulated annealing algorithm for a maintenance planning problem

2022

View full text Add to dashboard Cite

Maintenance planning aims to improve the reliability of assets, prevent the occurrence of asset failures, and reduce maintenance costs associated with downtime of assets and maintenance resources (such as spare parts and workforce). Thus, effective maintenance planning is instrumental in ensuring high asset availability with the minimum cost. Nevertheless, to find such optimal planning is a nontrivial task due to the (i) complex and usually nonlinear inter-relationship between different planning decisions (e.g., inventory level and workforce capacity), and (ii) stochastic nature of the system (e.g., random failures of parts installed in assets). To alleviate these challenges, we study a joint maintenance planning problem by considering several decisions simultaneously, including workforce planning, workforce training, and spare parts inventory management. We develop a hybrid solution algorithm ($$\mathcal {DRLSA}$$ DRLSA ) that is a combination of Double Deep Q-Network based Deep Reinforcement Learning (DRL) and Simulated Annealing (SA) algorithms. In each episode of the proposed algorithm, the best solution found by DRL is delivered to SA to be used as an initial solution, and the best solution of SA is delivered to DRL to be used as the initial state. Different from the traditional SA algorithms where neighborhood structures are selected only randomly, the DRL part of $$\mathcal {DRLSA}$$ DRLSA learns to choose the best neighborhood structure to use based on experience gained from previous episodes. We compare the performance of the proposed solution algorithm with several well-known meta-heuristic algorithms, including Simulated Annealing, Genetic Algorithm (GA), and Variable Neighborhood Search (VNS). Further, we also develop a Machine Learning (ML) algorithm (i.e., K-Median) as another benchmark in which different properties of spare parts (e.g., failure rates, holding costs, and repair rates) are used as clustering features for the ML algorithm. Our study reveals that the $$\mathcal {DRLSA}$$ DRLSA finds the optimal solutions for relatively small-size instances, and it has the potential to outperform traditional meta-heuristic and ML algorithms.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

A deep reinforcement learning assisted simulated annealing algorithm for a maintenance planning problem

2022

View full text Add to dashboard Cite

show abstract

“…Furthermore, SB is by no means an optimal branching policy, therefore methods which offer the potential to go beyond it, such as RL, are particularly appealing. Etheve et al 2020 proposed FMSTS which, to the best of our knowledge, is the only published work to apply RL to branching and is therefore the SOTA RL branching algorithm. By using a DFS node selection strategy, they used the deep Q-network (DQN) approach (Mnih et al, 2013) to approximate the Q-function of the B&B sub-tree size rooted at the current node; a local Q-function which, in their setting, was equivalent to the number of global tree nodes.…”

Section: Related Workmentioning

confidence: 99%

“…Since branching can be formulated as a Markov decision process (MDP) (He et al, 2014;Gasse et al, 2019;Etheve et al, 2020), reinforcement learning (RL) seems a natural approach to discovering novel branching heuristics with superior decision quality and no need for expensive data labelling. However, branching has thus far proved largely intractable for RL for reasons we summarise into three key challenges.…”

Section: Introductionmentioning

confidence: 99%

“…Jumping around the B&B tree without the brancher's control whilst having only partial observability of the full tree makes the future states seen by the agent difficult to predict. Etheve et al (2020) therefore postulated the benefit of keeping the MDP within a sub-tree to improve observability and introduced the state-of-the-art (SOTA) fitting for minimising the sub-tree size (FMSTS) RL branching algorithm. However, in order to achieve this, FMSTS had to use a depth-first search (DFS) node selection policy which, as we demonstrate in Section 6, is highly sub-optimal and limits scalability.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

Parsonson¹,

Laterre²,

Barrett³

2022

Preprint

View full text Add to dashboard Cite

Combinatorial optimisation problems framed as mixed integer linear programmes (MILPs) are ubiquitous across a range of real-world applications. The canonical branch-and-bound (B&B) algorithm seeks to exactly solve MILPs by constructing a search tree of increasingly constrained sub-problems. In practice, its solving time performance is dependent on heuristics, such as the choice of the next variable to constrain ('branching'). Recently, machine learning (ML) has emerged as a promising paradigm for branching. However, prior works have struggled to apply reinforcement learning (RL), citing sparse rewards, difficult exploration, and partial observability as significant challenges. Instead, leading ML methodologies resort to approximating high quality handcrafted heuristics with imitation learning (IL), which precludes the discovery of novel policies and requires expensive data labelling. In this work, we propose retro branching; a simple yet effective approach to RL for branching. By retrospectively deconstructing the search tree into multiple paths each contained within a sub-tree, we enable the agent to learn from shorter trajectories with more predictable next states. In experiments on four combinatorial tasks, our approach enables learning-to-branch without any expert guidance or pre-training. We outperform the current state-of-the-art RL branching algorithm by 3-5× and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables, with ablations verifying that our retrospectively constructed trajectories are essential to achieving these results. * Work completed during internship at InstaDeep Preprint. Under review.

show abstract

“…However, about obtaining a higher dual value, the agent performs worst initially, but it finally obtains the best result, indicating its non-myopic policy. Different from [84], this work can transfer to larger instances, and the performance of the RL agent is significantly superior to FSB, RPB (reliability pseudocost branch), SVM, and GCN.…”

Section: Reinforcement Learning In Branchingmentioning

confidence: 99%

Branch and Bound in Mixed Integer Linear Programming Problems: A Survey of Techniques and Trends

Huang¹,

Chen²,

Huo³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we surveyed the existing literature studying different approaches and algorithms for the four critical components in the general branch and bound (B&B) algorithm, namely, branching variable selection, node selection, node pruning, and cutting-plane selection. However, the complexity of the B&B algorithm always grows exponentially with respect to the increase of the decision variable dimensions. In order to improve the speed of B&B algorithms, learning techniques have been introduced in this algorithm recently. We further surveyed how machine learning can be used to improve the four critical components in B&B algorithms. In general, a supervised learning method helps to generate a policy that mimics an expert but significantly improves the speed. An unsupervised learning method helps choose different methods based on the features. In addition, models trained with reinforcement learning can beat the expert policy, given enough training and a supervised initialization. Detailed comparisons between different algorithms have been summarized in our survey. Finally, we discussed some future research directions to accelerate and improve the algorithms further in the literature.

show abstract

Reinforcement Learning for Variable Selection in a Branch and Bound Algorithm

Cited by 23 publications

References 7 publications

A deep reinforcement learning assisted simulated annealing algorithm for a maintenance planning problem

A deep reinforcement learning assisted simulated annealing algorithm for a maintenance planning problem

Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

Branch and Bound in Mixed Integer Linear Programming Problems: A Survey of Techniques and Trends

Contact Info

Product

Resources

About