From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning

Munos, Rémi

doi:10.1561/2200000038

Cited by 190 publications

(190 citation statements)

References 85 publications

Supporting

Mentioning

181

Contrasting

Unclassified

Order By: Relevance

“…We consider that the constants are k 1 = 1, k 2 = 500 , and k 3 = 1. The non-convex algorithm Simultaneous Optimistic Optimization (SOO) details can be found in [14,15]. This algorithm is used in order to obtain the game solution-i.e., final heading angle, which is the control, and time of capture, which is the payoff of the game.…”

Section: Optimization Solution To the Systemmentioning

confidence: 99%

“…The non-convex optimization algorithm implementation used [14,15] in the optimization and hybrid methods stops when a fixed number of iterations have been done, regardless of whether a solution was found or not. In order to study how the iteration number affects to solution obtaining, we run the algorithm three times for optimization method (using {10 3 , 10 4 , 10 5 } iterations) and for the hybrid approach (using {10 2 , 10 3 , 10 4 } iterations).…”

Section: Simulation 1: Comparison Between Analytical Optimization Amentioning

confidence: 99%

See 1 more Smart Citation

Pursuit-evasion games: a tractable framework for antijamming games in aerial attacks

Parras

Zazo

Val

et al. 2017

J Wireless Com Network

View full text Add to dashboard Cite

We solve a communication problem between a UAV and a set of receivers, in the presence of a jamming UAV, using differential game theory tools. We propose a new approach in which this kind of games can be approximated as pursuit-evasion games. The problem is posed in terms of optimizing capacity, and it is solved in two ways: firstly, a surrogate function approach is used to approximate it as a pursuit-evasion game; secondly, the game is solved without that approximation. In both cases, Isaacs equations are used to find the solution. Finally, both approaches are compared in terms of relative distance and complexity.

show abstract

Section: Optimization Solution To the Systemmentioning

confidence: 99%

Section: Simulation 1: Comparison Between Analytical Optimization Amentioning

confidence: 99%

Pursuit-evasion games: a tractable framework for antijamming games in aerial attacks

Parras

Zazo

Val

et al. 2017

J Wireless Com Network

View full text Add to dashboard Cite

show abstract

“…Another example of successful use of MCTS is the optimization of a "black-box" function [27,8] where the goal is to get a good estimate of the maximum of the function (deterministic or stochastic) by evaluating it only a limited number of times. The idea is to design a sequence of input samples on which the function should be evaluated given the previously observed values.…”

Section: Monte Carlo Tree Search (Mcts)mentioning

confidence: 99%

Large Scale Hard Sample Mining with Monte Carlo Tree Search

Canévet¹,

Fleuret²

2016

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

We investigate an efficient strategy to collect false positives from very large training sets in the context of object detection. Our approach scales up the standard bootstrapping procedure by using a hierarchical decomposition of an image collection which reflects the statistical regularity of the detector's responses.Based on that decomposition, our procedure uses a Monte Carlo Tree Search to prioritize the sampling toward sub-families of images which have been observed to be rich in false positives, while maintaining a fraction of the sampling toward unexplored sub-families of images. The resulting procedure increases substantially the proportion of false positive samples among the visited ones compared to a naive uniform sampling.We apply experimentally this new procedure to face detection with a collection of ∼100,000 background images and to pedestrian detection with ∼32,000 images. We show that for two standard detectors, the proposed strategy cuts the number of images to visit by half to obtain the same amount of false positives and the same final performance.

show abstract

“…Instead of building the full tree, UCT chooses the action a that maximizes an upper confidence bound of Q(s, a), following the principle of optimism in the face of uncertainty. Several improvements and extensions for UCT have been proposed, including handling continuous actions [13] (see [16] for a review), and continuous states [1] with a simple Gaussian distance metric; however the knowledge of the probabilistic model is not directly exploited. For continuous states, parametric function approximation is often used (e.g., linear regression), nonetheless the model needs to be carefully tailored for the domain to solve [32].…”

Section: Related Workmentioning

confidence: 99%

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Nitti

Belle

Raedt

2015

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. Real-world planning problems frequently involve mixtures of continuous and discrete state variables and actions, and are formulated in environments with an unknown number of objects. In recent years, probabilistic programming has emerged as a natural approach to capture and characterize such complex probability distributions with general-purpose inference methods. While it is known that a probabilistic programming language can be easily extended to represent Markov Decision Processes (MDPs) for planning tasks, solving such tasks is challenging. Building on related efforts in reinforcement learning, we introduce a conceptually simple but powerful planning algorithm for MDPs realized as a probabilistic program. This planner constructs approximations to the optimal policy by importance sampling, while exploiting the knowledge of the MDP model. In our empirical evaluations, we show that this approach has wide applicability on domains ranging from strictly discrete to strictly continuous to hybrid ones, handles intricacies such as unknown objects, and is argued to be competitive given its generality.

show abstract

From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning

Cited by 190 publications

References 85 publications

Pursuit-evasion games: a tractable framework for antijamming games in aerial attacks

Pursuit-evasion games: a tractable framework for antijamming games in aerial attacks

Large Scale Hard Sample Mining with Monte Carlo Tree Search

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Contact Info

Product

Resources

About