2018
DOI: 10.1609/aaai.v32i1.12211
|View full text |Cite
|
Sign up to set email alerts
|

Learning Robust Search Strategies Using a Bandit-Based Approach

Abstract: Effective solving of constraint problems often requires choosing good or specific search heuristics. However, choosing or designing a good search heuristic is non-trivial and is often a manual process. In this paper, rather than manually choosing/designing search heuristics, we propose the use of bandit-based learning techniques to automatically select search heuristics. Our approach is online where the solver learns and selects from a set of heuristics during search. The goal is to obtain automatic search heu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 16 publications
(22 reference statements)
0
3
0
Order By: Relevance
“…The table also shows the true ranks of predicted values. 6 In general we see that the first three predicted ranks are close to the true first rank based on the average size of the dataset. For example, for DS on RCPSP the 1 st predicted rank has as true 1 st rank 706 which is 706/41730 ⋅ 100 = 1.7% from the top rank.…”
Section: Evaluating Deepified Heuristicsmentioning
confidence: 63%
See 1 more Smart Citation
“…The table also shows the true ranks of predicted values. 6 In general we see that the first three predicted ranks are close to the true first rank based on the average size of the dataset. For example, for DS on RCPSP the 1 st predicted rank has as true 1 st rank 706 which is 706/41730 ⋅ 100 = 1.7% from the top rank.…”
Section: Evaluating Deepified Heuristicsmentioning
confidence: 63%
“…Various variable ordering heuristics have been designed by human experts [2][3][4]. Recent work also acquires dedicated heuristics using machine learning (ML), or learns which of a given set of heuristic to use [5][6][7][8]. However, both classical and learned heuristics are based on the current search node.…”
Section: Introductionmentioning
confidence: 99%
“…Rewards updated through ERWA were also used to adaptively select a backtracking strategy in [Bac+15]. Furthermore, MAB frameworks were used to select a search heuristic among a set of candidate ones at each node of the search tree in [XY18] or at each restart in [Wat+20;Kor+22]. Simple bandit-driven perturbation strategies to incorporate random choices in constraint solving with restarts were also introduced and evaluated in [PW20].…”
Section: Minimax Optimal Strategy In the Stochastic Case (Moss) [Ab09]mentioning
confidence: 99%
“…This question naturally calls for a "bandit" approach, as recently advocated in [Xia and Yap, 2018;Wattez et al, 2020]. Multi-armed bandit problems are sequential decision tasks in which the learning algorithm has access to a set of arms, and observes the reward for the chosen arm after each trial.…”
Section: Introductionmentioning
confidence: 99%