2015
DOI: 10.1145/2717316
|View full text |Cite
|
Sign up to set email alerts
|

Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

Abstract: Markov decision processes (MDPs) provide a rich framework for planning under uncertainty. However, exactly solving a large MDP is usually intractable due to the "curse of dimensionality"-the state space grows exponentially with the number of state variables. Online algorithms tackle this problem by avoiding computing a policy for the entire state space. On the other hand, since online algorithm has to find a near-optimal action online in almost real time, the computation time is often very limited. In the cont… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
25
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 30 publications
(25 citation statements)
references
References 37 publications
0
25
0
Order By: Relevance
“…It is evident that the local maximum attained by test 34 does not satisfy all these local constraints, highlighting the need for a second, constraint satisfaction, phase. The algorithm continues with the best solution X 34 = {2, 3,5,4,8,10,11,9,6, 7}, reevaluated after 16000 games in order to ensure a higher precision, attaining f 34 = −3.14496, and then generates the candidates closer to and within the constrained sub-space, as shown in Table 4, for a specified number of tests, e.g., 10 additional tests. The maximum attained at the end of this phase is given by X 44 = {5, 4, 2, 3, 7, 6, 8, 10, 11, 9} with f 44 = −2.95471, reducing the goal deficit by further 0.2.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…It is evident that the local maximum attained by test 34 does not satisfy all these local constraints, highlighting the need for a second, constraint satisfaction, phase. The algorithm continues with the best solution X 34 = {2, 3,5,4,8,10,11,9,6, 7}, reevaluated after 16000 games in order to ensure a higher precision, attaining f 34 = −3.14496, and then generates the candidates closer to and within the constrained sub-space, as shown in Table 4, for a specified number of tests, e.g., 10 additional tests. The maximum attained at the end of this phase is given by X 44 = {5, 4, 2, 3, 7, 6, 8, 10, 11, 9} with f 44 = −2.95471, reducing the goal deficit by further 0.2.…”
Section: Resultsmentioning
confidence: 99%
“…The RoboCup Soccer 2D Simulation League provides a rich dynamic environment, facilitated by the RoboCup Soccer Simulator (RCSS), aimed to test advances in decentralised collective behaviours of autonomous agents. The challenges include concurrent adversarial actions, computational nondeterminism, noise and latency in asynchronous perception and actuation, and limited processing time [3,5,7,29,37,38,42,43,46]. The League progress has been supported by several important base code releases, covering both low-level skills and standardised world models of simulated agents [1,22,45,47].…”
Section: Introductionmentioning
confidence: 99%
“…The WrightEagle team is developed base on the Markov Decision Processes (MDPs) framework [1] with the MAXQ hierarchical structure [2] and heuristic approximate online planning techniques in the past years [3] [4].…”
Section: Maxq Hierarchical Decompositionmentioning
confidence: 99%
“…For instance, at Robocup 2013, there were a total of 20 teams from 9 countries entering the 2D simulation competition 1 . Taking into account the scale of the 2D simulation competition, making good decisions against so many different opponents is a very challenging task and dealing with complex and varied situations is very difficult [3]. This paper introduces the RoboCup 2013 Soccer Simulation 2D League champion team, WrightEagle, that has won 4 champions and 5 runners-up in the past 9 years.…”
Section: Introductionmentioning
confidence: 99%
“…Completely ignoring domain information can cause the learning algorithms inefficiency. Dietterich presented the MAXQ hierarchical RL method, which is based on a semi-Markov decision process (SMDP) [5]- [7]. MAXQ employs user knowledge by a layered approach, and it divides the overall learning task into a number of reusable sub-Markov decision processes to speed up the learning.…”
Section: Introductionmentioning
confidence: 99%