2019
DOI: 10.48550/arxiv.1912.06088
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning to Reach Goals via Iterated Supervised Learning

Abstract: Imitation learning algorithms provide a simple and straightforward approach for training control policies via supervised learning. By maximizing the likelihood of good actions provided by an expert demonstrator, supervised imitation learning can produce effective policies without the algorithmic complexities and optimization challenges of reinforcement learning, at the cost of requiring an expert demonstrator to provide the demonstrations. In this paper, we ask: can we take insights from imitation learning to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(43 citation statements)
references
References 10 publications
0
43
0
Order By: Relevance
“…The method proposed by Kumar et al [44] is most similar to our method with K = 1, which we find sequence modeling/long contexts to outperform (see Section 5.3). Ghosh et al [47] extends prior UDRL methods to use state goal conditioning, rather than rewards, and Paster et al [48] further use an LSTM with state goal conditioning for goal-conditoned online RL settings.…”
Section: Supervised Learning In Reinforcement Learning Settingsmentioning
confidence: 99%
“…The method proposed by Kumar et al [44] is most similar to our method with K = 1, which we find sequence modeling/long contexts to outperform (see Section 5.3). Ghosh et al [47] extends prior UDRL methods to use state goal conditioning, rather than rewards, and Paster et al [48] further use an LSTM with state goal conditioning for goal-conditoned online RL settings.…”
Section: Supervised Learning In Reinforcement Learning Settingsmentioning
confidence: 99%
“…For example, Lillicrap et al (2015) extend it to the continuous action space setting, and Fujimoto et al (2018) further stabilize training. These improvements are fully compatible with goal-reaching (Pong et al, 2019;Bharadhwaj et al, 2020a;Ghosh et al, 2019). Andrychowicz et al (2017) proposed Hindsight Experience Replay (HER), which relabels past experience as achieved goals, and allows sample efficient learning from sparse rewards .…”
Section: Background and Related Workmentioning
confidence: 99%
“…We finish this section by highlighting differences between C-learning and related work. Ghosh et al (2019) proposed GCSL, a method for goal-reaching inspired by supervised learning. In their derivations, they also include a horizon h which their policies can depend on, but they drop this dependence in their experiments as they did not see a practical benefit by including h. We find the opposite for C-learning.…”
Section: Cumulative Accessibility Estimationmentioning
confidence: 99%
See 2 more Smart Citations