Improving the Exploration in Upper Confidence Trees

Couëtoux, Adrien; Doghmen, Hassen; Teytaud, Olivier

doi:10.1007/978-3-642-34413-8_29

Cited by 5 publications

(5 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in Fig. 12, by the selection, expansion, simulation, and backpropagation, the MCTS can generate the optimized action combination t A based on the score function of upper confidence bound applied to trees (UCT) (Furtak and Buro, 2013;Couetoux et al, 2012). The policy score of the UCT in a tree search node is expressed as:…”

Section: Policy ᵽmentioning

confidence: 99%

3D path planning and real-time collision resolution of multirotor drone operations in complex urban low-altitude airspace

Zhang

Low

2021

Transportation Research Part C: Emerging Technologies

View full text Add to dashboard Cite

Section: Policy ᵽmentioning

confidence: 99%

3D path planning and real-time collision resolution of multirotor drone operations in complex urban low-altitude airspace

Zhang

Low

2021

Transportation Research Part C: Emerging Technologies

View full text Add to dashboard Cite

“…The simplest way to add new actions is to select a random action from the continuous action space. More advanced approaches use the information available in the current state, so that promising areas within the action space can be identified and new actions can be added ( [21], [22]).…”

Section: Progressive Wideningmentioning

confidence: 99%

“…In [21] a heuristic -the so-called Blind Values (BV)n randomly drawn actions of the theoretical action space A are evaluated and the action with the resulting maximum Blind Value, see (16), is added to the locally available action space of the state. Blind values make use of the information gathered on all actions already explored from a given state in the past to select promising areas of exploration.…”

Section: G Guided Searchmentioning

confidence: 99%

Decentralized Cooperative Planning for Automated Vehicles with Continuous Monte Carlo Tree Search

Kurzer

Engelhorn

Zöllner

2018

2018 21st International Conference on Intelligent Transportation Systems (ITSC)

View full text Add to dashboard Cite

Efficient driving in urban traffic scenarios requires foresight. The observation of other traffic participants, and the inference of their possible next actions depending on the own action is considered cooperative prediction and planning. Humans are well equipped with the capability to predict the actions of multiple interacting traffic participants and plan accordingly, without the need to directly communicate with others. Prior work has shown that it is possible to achieve effective cooperative planning without the need for explicit communication. However, the search space for cooperative plans is so large that the vast amount of the computational budget is spent on exploring the search space in unpromising regions that are far away from the solution. To accelerate the planning process, we combined learned heuristics with a cooperative planning method in order to guide the search towards regions with promising actions, yielding better results at lower computational costs.

show abstract

“…In [13] (Section 5.3), the authors consider multi-stage stochastic programming techniques and optimize the scenario trees using Monte-Carlo methods. Closer to our work, it is proposed in [11] to parameterize a tree-search technique for decision-making: upper confidence trees. In this work, the parameters enable to control the simulation policy used to estimate long-term returns within the upper confidence tree algorithm.…”

Section: Parameterized Algorithms For Decision-makingmentioning

confidence: 99%

Optimized look‐ahead tree policies: a bridge between look‐ahead tree policies and direct policy search

Jung

Wehenkel

Ernst

et al. 2013

Adaptive Control & Signal

View full text Add to dashboard Cite

Direct policy search (DPS) and look-ahead tree (LT) policies are two widely used classes of techniques to produce high performance policies for sequential decision-making problems. To make DPS approaches work well, one crucial issue is to select an appropriate space of parameterized policies with respect to the targeted problem. A fundamental issue in LT approaches is that, to take good decisions, such policies must develop very large look-ahead trees which may require excessive online computational resources. In this paper, we propose a new hybrid policy learning scheme that lies at the intersection of DPS and LT, in which the policy is an algorithm that develops a small look-ahead tree in a directed way, guided by a node scoring function that is learned through DPS. The LT-based representation is shown to be a versatile way of representing policies in a DPS scheme, while at the same time, DPS enables to significantly reduce the size of the look-ahead trees that are required to take high-quality decisions.We experimentally compare our method with two other state-of-the-art DPS techniques and four common LT policies on four benchmark domains and show that it combines the advantages of the two techniques from which it originates. In particular, we show that our method: (1) produces overall better performing policies than both pure DPS and pure LT policies, (2) requires a substantially smaller number of policy evaluations than other DPS techniques, (3) is easy to tune and (4) results in policies that are quite robust with respect to perturbations of the initial conditions.

show abstract

Improving the Exploration in Upper Confidence Trees

Cited by 5 publications

References 7 publications

3D path planning and real-time collision resolution of multirotor drone operations in complex urban low-altitude airspace

3D path planning and real-time collision resolution of multirotor drone operations in complex urban low-altitude airspace

Decentralized Cooperative Planning for Automated Vehicles with Continuous Monte Carlo Tree Search

Optimized look‐ahead tree policies: a bridge between look‐ahead tree policies and direct policy search

Contact Info

Product

Resources

About