Learning robust driving policies without online exploration

Graves, Daniel; Nguyen, Nhat M.; Hassanzadeh, Kimia; Jin, Jun; Luo, Jun

doi:10.1109/icra48506.2021.9561450

Cited by 2 publications

(11 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In practice, multiple such concerns may simultaneously matter about an option and multiple options may be simultaneously evaluated under the same concern. These can be easily accommodated by introducing the corresponding cumulants and options and, where appropriate, allow the various V s and Q s to share underlying computational implementation (Graves et al, 2020b; Sherstan & Pilarski, 2014; Ugur et al, 2007).…”

Section: Affordances As Gvfsmentioning

confidence: 99%

“…We see evidence of this in learning affordances that relate to success/failure of bin picking (Zeng et al, 2018), traversability (Ugur et al, 2007), and success/failure of mobile manipulation (Wu et al, 2020) where thousands of predictions structured spatially in a grid pattern are learned. Second, when combined with DL, multiple predictions can be made with the same neural network by sharing earlier layers across different prediction heads, allowing for potentially sublinear scaling in the number of predictions (Graves et al, 2020b; Sherstan & Pilarski, 2014; Zeng et al, 2018). Third, as noted above, for a single prediction, the direct perception architecture operates in constant time regardless of how distant a future the prediction needs to be concerned about.…”

Section: Affordances As Gvfsmentioning

confidence: 99%

“…However, in spite of these nice properties of GVFs for making predictions, in spite of the well-established mechanisms for learning them, and in spite of some specific examples of how GVF predictions may be used to control real-world robots (Edwards et al, 2016; Graves et al, 2019, 2020b; Günther et al, 2018), there is as yet no general computational framework for using such predictions, such as using them in downstream learning or optimization of policies.…”

Section: Related Workmentioning

confidence: 99%

“…Since any meaningful scalar-valued signal could be used as cumulant, all sorts of valence predictions can be made. In practice, the cumulant can be hand-engineered features such as safety (Graves et al, 2019), comfort, centeredness (Graves et al, 2020b), and duration (Sutton et al, 2011). The cumulant can also represent outcomes such as traversability (Ugur et al, 2007), success or failure (Zeng et al, 2018), or human-provided quality labels (Günther et al, 2016).…”

Section: Affordances As Gvfsmentioning

confidence: 99%

“…While affordance perception as GVF prediction may take geometric measurements and object detection results as input (Graves et al, 2019), dependency on these are strictly optional. Instead, predictions can be made directly from raw sensory inputs (Graves et al, 2020b; Günther et al, 2016; Zeng et al, 2018). In contrast, if object detection is a necessary dependency of affordance prediction (Hassanin et al, 2018), errors may be introduced through the data labeling needed for setting up the required supervised learning.…”

Section: Affordances As Gvfsmentioning

confidence: 99%

See 4 more Smart Citations

Affordance as general value function: a computational model

2021

Self Cite

View full text Add to dashboard Cite

General value functions (GVFs) in the reinforcement learning (RL) literature are long-term predictive summaries of the outcomes of agents following specific policies in the environment. Affordances as perceived action possibilities with specific valence may be cast into predicted policy-relative goodness and modeled as GVFs. A systematic explication of this connection shows that GVFs and especially their deep-learning embodiments (1) realize affordance prediction as a form of direct perception, (2) illuminate the fundamental connection between action and perception in affordance, and (3) offer a scalable way to learn affordances using RL methods. Through an extensive review of existing literature on GVF applications and representative affordance research in robotics, we demonstrate that GVFs provide the right framework for learning affordances in real-world applications. In addition, we highlight a few new avenues of research opened up by the perspective of “affordance as GVF,” including using GVFs for orchestrating complex behaviors.

show abstract

Section: Affordances As Gvfsmentioning

confidence: 99%

Section: Affordances As Gvfsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Affordances As Gvfsmentioning

confidence: 99%

Section: Affordances As Gvfsmentioning

confidence: 99%

See 3 more Smart Citations

Affordance as general value function: a computational model

2021

Self Cite

View full text Add to dashboard Cite

show abstract

Offline Learning of Counterfactual Predictions for Real-World Robotic Reinforcement Learning

Jin

Graves

Haigh

et al. 2022

2022 International Conference on Robotics and Automation (ICRA)

Self Cite

View full text Add to dashboard Cite

Autonomous excavation is a challenging task. The unknown contact dynamics between the excavator bucket and the terrain could easily result in large contact forces and jamming problems during excavation. Traditional model-based methods struggle to handle such problems due to complex dynamic modeling. In this paper, we formulate the excavation skills with three novel manipulation primitives. We propose to learn the manipulation primitives with offline reinforcement learning (RL) to avoid large amounts of online robot interactions. The proposed method can learn efficient penetration skills from sub-optimal demonstrations, which contain subtrajectories that can be "stitched" together to formulate an optimal trajectory without causing jamming. We evaluate the proposed method with extensive experiments on excavating a variety of rigid objects and demonstrate that the learned policy outperforms the demonstrations. We also show that the learned policy can quickly adapt to unseen and challenging fragmented rocks with online fine-tuning.

show abstract

Learning robust driving policies without online exploration

Cited by 2 publications

References 35 publications

Affordance as general value function: a computational model

Affordance as general value function: a computational model

Offline Learning of Counterfactual Predictions for Real-World Robotic Reinforcement Learning

Contact Info

Product

Resources

About