Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/433
|View full text |Cite
|
Sign up to set email alerts
|

I²HRL: Interactive Influence-based Hierarchical Reinforcement Learning

Abstract: Hierarchical reinforcement learning (HRL) is a promising approach to solve tasks with long time horizons and sparse rewards. It is often implemented as a high-level policy assigning subgoals to a low-level policy. However, it suffers the high-level non-stationarity problem since the low-level policy is constantly changing. The non-stationarity also leads to the data efficiency problem: policies need more data at non-stationary states to stabilize training. To address these issues, we propose a novel HR… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(17 citation statements)
references
References 0 publications
0
15
0
Order By: Relevance
“…In (Zhang et al, 2020) the large subgoal space issue was addressed by restricting the highlevel action space from the whole subgoal space using an adjacency constraint. In (Wang et al, 2020) high-level policy decision making is conditioned on the received low-level representation as well as the state of the environment to improve stationarity. Another solutions is to add a slowness objective to effectively learn the subgoal representation so that the low-level reward function varies in a stationary way (Li et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…In (Zhang et al, 2020) the large subgoal space issue was addressed by restricting the highlevel action space from the whole subgoal space using an adjacency constraint. In (Wang et al, 2020) high-level policy decision making is conditioned on the received low-level representation as well as the state of the environment to improve stationarity. Another solutions is to add a slowness objective to effectively learn the subgoal representation so that the low-level reward function varies in a stationary way (Li et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…For both SPR and OPR, we use permutation-invariant transformations, i.e., mainly use MLP then Mean-Reduce operation, and detailed implementation is provided in Appendix D.2. A few works also involve representation or embedding learning for RL policy in Multiagent Learning (Grover et al, 2018), Hierarchical RL (Wang et al, 2020), Policy Adaptation and Transfer (Hausman et al, 2018;Arnekvist et al, 2019;Raileanu et al, 2020;Harb et al, 2020). We found almost all of them belongs to the scope of SPR.…”
Section: Policy Representation Learningmentioning
confidence: 99%
“…Recent years, a few works involve representation or embedding learning for RL policy (Hausman et al, 2018;Grover et al, 2018;Arnekvist et al, 2019;Raileanu et al, 2020;Wang et al, 2020;Harb et al, 2020). We provide a brief review and summary for above works below.…”
Section: D5 a Review Of Related Work On Policy Representation/embeddi...mentioning
confidence: 99%
See 1 more Smart Citation
“…This can be viewed as a special two-agent cooperative game where the high level and low level learn to coordinate at the optimal hybrid actions. Although the hierarchical structure seems to be natural, it suffers from the high-level non-stationarity caused by offpolicy learning dynamics (Wang et al 2020), i.e., a discrete action can no longer induce the same transition in historical experiences due to the change of the low-level policy. All the above works focus on policy learning over original hybrid action space.…”
Section: Introductionmentioning
confidence: 99%