2012
DOI: 10.1007/978-3-642-34413-8_29
|View full text |Cite
|
Sign up to set email alerts
|

Improving the Exploration in Upper Confidence Trees

Abstract: Abstract. In the standard version of the UCT algorithm, in the case of a continuous set of decisions, the exploration of new decisions is done through blind search. This can lead to very inefficient exploration, particularly in the case of large dimension problems, which often happens in energy management problems, for instance. In an attempt to use the information gathered through past simulations to better explore new decisions, we propose a method named Blind Value (BV). It only requires the access to a fun… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 7 publications
0
5
0
Order By: Relevance
“…As shown in Fig. 12, by the selection, expansion, simulation, and backpropagation, the MCTS can generate the optimized action combination t A based on the score function of upper confidence bound applied to trees (UCT) (Furtak and Buro, 2013;Couetoux et al, 2012). The policy score of the UCT in a tree search node is expressed as:…”
Section: Policy ᵽmentioning
confidence: 99%
“…As shown in Fig. 12, by the selection, expansion, simulation, and backpropagation, the MCTS can generate the optimized action combination t A based on the score function of upper confidence bound applied to trees (UCT) (Furtak and Buro, 2013;Couetoux et al, 2012). The policy score of the UCT in a tree search node is expressed as:…”
Section: Policy ᵽmentioning
confidence: 99%
“…The simplest way to add new actions is to select a random action from the continuous action space. More advanced approaches use the information available in the current state, so that promising areas within the action space can be identified and new actions can be added ( [21], [22]).…”
Section: Progressive Wideningmentioning
confidence: 99%
“…In [21] a heuristic -the so-called Blind Values (BV)n randomly drawn actions of the theoretical action space A are evaluated and the action with the resulting maximum Blind Value, see (16), is added to the locally available action space of the state. Blind values make use of the information gathered on all actions already explored from a given state in the past to select promising areas of exploration.…”
Section: G Guided Searchmentioning
confidence: 99%
“…In [13] (Section 5.3), the authors consider multi-stage stochastic programming techniques and optimize the scenario trees using Monte-Carlo methods. Closer to our work, it is proposed in [11] to parameterize a tree-search technique for decision-making: upper confidence trees. In this work, the parameters enable to control the simulation policy used to estimate long-term returns within the upper confidence tree algorithm.…”
Section: Parameterized Algorithms For Decision-makingmentioning
confidence: 99%