Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations

Skrynnik, Alexey; Staroverov, Aleksey; Aitygulov, Ermek; Aksenov, Kirill; Davydov, Vasilii; Panov, Aleksandr I.

doi:10.1016/j.knosys.2021.106844

Cited by 18 publications

(10 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since MineRL was held in 2019, many solutions have been proposed to learn to play in Minecraft. There works can be grouped into two categories: 1) end-to-end learning [Amiranashvili et al, 2020;Kanervisto et al, 2020;Scheller et al, 2020]; 2) HRL with human demonstrations [Skrynnik et al, 2021;Mao et al, 2021]. Our approach belongs to the second category.…”

Section: Related Workmentioning

confidence: 99%

“…Our approach belongs to the second category. In this category, prior works leverage the structure of the tasks and learn a hierarchical agent to play in Minecraft -ForgER [Skrynnik et al, 2021] proposed a hierarchical method with forgetful experience replay to allow the agent to learn from low-quality demonstrations; Mao et al [2021] proposed SEIHAI that fully takes advantage of the human demonstrations and the task structure. Sample-efficient Reinforcement Learning.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, open-world games have been attracting attention due to its playing mechanism and similarity to real-world control tasks [Guss et al, 2021]. Minecraft, as a typical open-world game, has been increasingly explored for the past few years [Oh et al, 2016;Tessler et al, 2017;Guss et al, 2019;Kanervisto et al, 2020;Skrynnik et al, 2021;Mao et al, 2021]. * Equal contribution Compared to other games, the characteristics of Minecraft make it a suitable testbed for RL research, as it emphasizes exploration, perception and construction in a 3D open world [Oh et al, 2016].…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, to facilitate the efficient decision-making of agents in playing Minecraft, MineRL [Guss et al, 2019] has been developed as a research competition platform, which provides human demonstrations and encourages the development of sample-efficient RL agents for playing Minecraft. Since the release of MineRL, a number of efforts have been made on developing Minecraft AI agents, e.g., ForgER [Skrynnik et al, 2021], SEIHAI [Mao et al, 2021].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

Lin¹,

Li²,

Shi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning rational behaviors in open-world games like Minecraft remains to be challenging for Reinforcement Learning (RL) research due to the compound challenge of partial observability, highdimensional visual perception and delayed reward.To address this, we propose JueWu-MC, a sampleefficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration. Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task. To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for policy robustness. Extensive experiments show that JueWu-MC significantly improves sample efficiency and outperforms a set of baselines by a large margin. Notably, we won the championship of the NeurIPS MineRL 2021 research competition and achieved the highest performance score ever.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

Lin¹,

Li²,

Shi³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In this paper, we propose to use reinforcement learning [9], [12] to generate the behavior of each autonomous agent in a multi-agent partially-observable environment, which in our case is an abstraction for the Internet of vehicles. Modelfree reinforcement learning methods have shown excellent results in behavior generation tasks for single agents [13]- [15] and cooperative environments [16], [17]. Modern deep reinforcement learning algorithms cope well with complex observation space (visual environments) [18] and stochastic environmental conditions [19].…”

Section: Introductionmentioning

confidence: 99%

Hybrid Policy Learning for Multi-Agent Pathfinding

et al. 2021

Self Cite

View full text Add to dashboard Cite

In this work we study the behavior of groups of autonomous vehicles, which are the part of the Internet of Vehicles systems. One of the challenging modes of operation of such systems is the case when the observability of each vehicle is limited and the global/local communication is unstable, e.g. in the crowded parking lots. In such scenarios the vehicles have to rely on the local observations and exhibit cooperative behavior to ensure safe and efficient trips. This type of problems can be abstracted to the socalled multi-agent pathfinding when a group of agents, confined to a graph, have to find collision-free paths to their goals (ideally, minimizing an objective function e.g. travel time). Widely used algorithms for solving this problem rely on the assumption that a central controller exists for which the full state of the environment (i.e. the agents current positions, their targets, configuration of the static obstacles etc.) is known and they can not be straightforwardly be adapted to the partially-observable setups. To this end, we suggest a novel approach which is based on the decomposition of the problem into the two sub-tasks: reaching the goal and avoiding the collisions. To accomplish each of this task we utilize reinforcement learning methods such as Deep Monte Carlo Tree Search, Q-mixing networks, and policy gradients methods to design the policies that map the agents' observations to actions. Next, we introduce the policy-mixing mechanism to end up with a single hybrid policy that allows each agent to exhibit both types of behavior -the individual one (reaching the goal) and the cooperative one (avoiding the collisions with other agents). We conduct an extensive empirical evaluation that shows that the suggested hybrid-policy outperforms standalone stat-ofthe-art reinforcement learning methods for this kind of problems by a notable margin.

show abstract