Robotics: Science and Systems XIV 2018
DOI: 10.15607/rss.2018.xiv.008
|View full text |Cite
|
Sign up to set email alerts
|

Asymmetric Actor Critic for Image-Based Robot Learning

Abstract: Abstract-Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
153
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 159 publications
(153 citation statements)
references
References 39 publications
0
153
0
Order By: Relevance
“…While we are motivated to devise sidekick policy learning for active visual exploration, it is more generally applicable whenever an RL agent can access greater observability during training than during deployment. For example, agents may operate on first-person observations during test-time, yet have access to multiple sensors during training in simulation environments (38)(39)(40). Similarly, an active object recognition system (8,10,11,25,29) can only see its previously selected views of the object; yet if trained with CAD models, it could observe all possible views while learning.…”
Section: Resultsmentioning
confidence: 99%
“…While we are motivated to devise sidekick policy learning for active visual exploration, it is more generally applicable whenever an RL agent can access greater observability during training than during deployment. For example, agents may operate on first-person observations during test-time, yet have access to multiple sensors during training in simulation environments (38)(39)(40). Similarly, an active object recognition system (8,10,11,25,29) can only see its previously selected views of the object; yet if trained with CAD models, it could observe all possible views while learning.…”
Section: Resultsmentioning
confidence: 99%
“…As done in [10,11,14,15,16] we use the domain parameter distribution as a prior which ensures the physical plausibility of each parameter. Note that specifying this distribution in the current state-of-the-art requires the researcher to make design decisions.…”
Section: Required Randomized Simulatorsmentioning
confidence: 99%
“…However, when transferred to real-world robotic systems, most of these methods become less attractive due to high sample complexity and a lack of explainability of state-of-the-art deep RL algorithms. As a consequence, the research field of domain randomization has recently been gaining interest [10,11,12,13,14,15,16,17]. This class of approaches promises to transfer control policies learned in simulation (source domain) to the real world (target domain) by randomizing the simulator's parameters (e.g., masses, extents, or friction coefficients) and hence train from a set of models instead of just one nominal model.…”
Section: Introductionmentioning
confidence: 99%
“…Domain adaptation methods either map both image spaces into a common one [14], [18] or map one into the other [15]. Domain randomization methods add noise to the synthetic images [21], [28], thus making the control policy robust to different textures and lighting. The second line of work is attractive due to its effectiveness and simplicity.…”
Section: Related Workmentioning
confidence: 99%