Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

Bai, Aijun; Wu, Feng; Chen, Xiaoping

doi:10.1145/2717316

Cited by 30 publications

(25 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is evident that the local maximum attained by test 34 does not satisfy all these local constraints, highlighting the need for a second, constraint satisfaction, phase. The algorithm continues with the best solution X 34 = {2, 3,5,4,8,10,11,9,6, 7}, reevaluated after 16000 games in order to ensure a higher precision, attaining f 34 = −3.14496, and then generates the candidates closer to and within the constrained sub-space, as shown in Table 4, for a specified number of tests, e.g., 10 additional tests. The maximum attained at the end of this phase is given by X 44 = {5, 4, 2, 3, 7, 6, 8, 10, 11, 9} with f 44 = −2.95471, reducing the goal deficit by further 0.2.…”

Section: Resultsmentioning

confidence: 99%

“…The RoboCup Soccer 2D Simulation League provides a rich dynamic environment, facilitated by the RoboCup Soccer Simulator (RCSS), aimed to test advances in decentralised collective behaviours of autonomous agents. The challenges include concurrent adversarial actions, computational nondeterminism, noise and latency in asynchronous perception and actuation, and limited processing time [3,5,7,29,37,38,42,43,46]. The League progress has been supported by several important base code releases, covering both low-level skills and standardised world models of simulated agents [1,22,45,47].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Fractals2019: Combinatorial Optimisation with Dynamic Constraint Annealing

Prokopenko¹

2019

RoboCup 2019: Robot World Cup XXIII

View full text Add to dashboard Cite

Fractals2019 started as a new experimental entry in the RoboCup Soccer 2D Simulation League, based on Gliders2d code base, and advanced to become a RoboCup-2019 champion. We employ combinatorial optimisation methods, within the framework of Guided Self-Organisation, with the search guided by local constraints. We present examples of several tactical tasks based on the Gliders2d code (version v2), including the search for an optimal assignment of heterogeneous player types, as well as blocking behaviours, offside trap, and attacking formations. We propose a new method, Dynamic Constraint Annealing, for solving dynamic constraint satisfaction problems, and apply it to optimise thermodynamic potential of collective behaviours, under dynamically induced constraints.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Fractals2019: Combinatorial Optimisation with Dynamic Constraint Annealing

Prokopenko¹

2019

RoboCup 2019: Robot World Cup XXIII

View full text Add to dashboard Cite

show abstract

“…The WrightEagle team is developed base on the Markov Decision Processes (MDPs) framework [1] with the MAXQ hierarchical structure [2] and heuristic approximate online planning techniques in the past years [3] [4].…”

Section: Maxq Hierarchical Decompositionmentioning

confidence: 99%

“…For instance, at Robocup 2013, there were a total of 20 teams from 9 countries entering the 2D simulation competition 1 . Taking into account the scale of the 2D simulation competition, making good decisions against so many different opponents is a very challenging task and dealing with complex and varied situations is very difficult [3]. This paper introduces the RoboCup 2013 Soccer Simulation 2D League champion team, WrightEagle, that has won 4 champions and 5 runners-up in the past 9 years.…”

Section: Introductionmentioning

confidence: 99%

The Decision-Making Framework of WrightEagle, the RoboCup 2013 Soccer Simulation 2D League Champion Team

Zhang

Chen

2014

RoboCup 2013: Robot World Cup XVII

Self Cite

View full text Add to dashboard Cite

show abstract

“…Completely ignoring domain information can cause the learning algorithms inefficiency. Dietterich presented the MAXQ hierarchical RL method, which is based on a semi-Markov decision process (SMDP) [5]- [7]. MAXQ employs user knowledge by a layered approach, and it divides the overall learning task into a number of reusable sub-Markov decision processes to speed up the learning.…”

Section: Introductionmentioning

confidence: 99%

Heuristic Function Negotiation for Markov Decision Process and Its Application in UAV Simulation

Zhao

Shao²

2014

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThe traditional reinforcement learning (RL) methods can solve Markov Decision Processes (MDPs) online, but these learning methods cannot effectively use a priori knowledge to guide the learning process. The exploration of the optimal policy is time-consuming and does not employ the information about specific issues. To tackle the problem, this paper proposes heuristic function negotiation (HFN) as an online learning framework. The HFN framework extends MDPs and introduces heuristic functions. HFN changes the state-action dual layer structure of traditional RL to the triple layer structure, in which multiple heuristic functions can be set to meet the needs required to solve the problem. The HFN framework can use different algorithms to let the functions negotiate to determine the appropriate action, and adjust the impact of each function according to the rewards. The HFN framework introduces domain knowledge by setting heuristic functions and thus speeds up the problem solving of MDPs. Furthermore, user preferences can be reflected in the learning process, which improves the flexibility of RL. The experiments show that, by setting reasonable heuristic functions, the learning results of the HFN framework are more efficient than traditional RL. We also apply HFN to the air combat simulation of unmanned aerial vehicles (UAVs), which shows that different function settings lead to different combat behaviors.

show abstract

Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

Cited by 30 publications

References 37 publications

Fractals2019: Combinatorial Optimisation with Dynamic Constraint Annealing

Fractals2019: Combinatorial Optimisation with Dynamic Constraint Annealing

The Decision-Making Framework of WrightEagle, the RoboCup 2013 Soccer Simulation 2D League Champion Team

Heuristic Function Negotiation for Markov Decision Process and Its Application in UAV Simulation

Contact Info

Product

Resources

About