Setting up a Reinforcement Learning Task with a Real-World Robot

Mahmood, A. Rupam; Korenkevych, Dmytro; Komer, Brent; Bergstra, James

doi:10.1109/iros.2018.8593894

Cited by 76 publications

(86 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In particular, we extend the framework of maximum entropy RL. Methods of this type, such as soft actorcritic [17] and soft Q-learning [15], can achieve state-of-the-art sample efficiency [17] and have been successfully deployed in real-world manipulation tasks [16,31], where they exhibit a high degree of robustness due to entropy maximization [16]. However, maximum entropy RL algorithms are sensitive to the choice of the temperature parameter, which determines the trade-off between exploration (maximizing the entropy) and exploitation (maximizing the reward).…”

Section: Introductionmentioning

confidence: 99%

Learning to Walk Via Deep Reinforcement Learning

Haarnoja¹,

Ha²,

Zhou³

et al. 2019

Robotics: Science and Systems XV

270

237

View full text Add to dashboard Cite

Deep reinforcement learning (deep RL) holds the promise of automating the acquisition of complex controllers that can map sensory inputs directly to low-level actions. In the domain of robotic locomotion, deep RL could enable learning locomotion skills with minimal engineering and without an explicit model of the robot dynamics. Unfortunately, applying deep RL to real-world robotic tasks is exceptionally difficult, primarily due to poor sample complexity and sensitivity to hyperparameters. While hyperparameters can be easily tuned in simulated domains, tuning may be prohibitively expensive on physical systems, such as legged robots, that can be damaged through extensive trial-and-error learning. In this paper, we propose a sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies. We apply this method to learning walking gaits on a real-world Minitaur robot. Our method can acquire a stable gait from scratch directly in the real world in about two hours, without relying on any model or simulation, and the resulting policy is robust to moderate variations in the environment. We further show that our algorithm achieves state-of-the-art performance on simulated benchmarks with a single set of hyperparameters. Videos of training and the learned policy can be found on the project website 3 .

show abstract

Section: Introductionmentioning

confidence: 99%

Learning to Walk Via Deep Reinforcement Learning

Haarnoja¹,

Ha²,

Zhou³

et al. 2019

Robotics: Science and Systems XV

270

237

View full text Add to dashboard Cite

show abstract

“…Figure 3 shows a visual representation of the RL algorithm classification and the mathematical description of a few RL algorithms shown in Figure 3 has been described in the appendix section. In comparison model-based RL with model-free RL, model-free RL has proved itself as a promising approach in the field of robotics [31]. Table 3 presents a quick view on the various RL algorithms used in the development of robots in different studies.…”

Section: A Learning Approachesmentioning

confidence: 99%

A Systematic Review on Reinforcement Learning-Based Robotics Within the Last Decade

Khan

Tooshil

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Robotics is one of the many tools that is making a substantial difference as the world is experiencing the fourth industrial revolution. To ease control over this engineering marvel substantially, Reinforcement Learning (RL) has paved its way in recent years quite remarkably. RL enables robots to become self-aware towards carrying out a specific task followed by user operations. For decades of rigorous endeavor, this research field has gone through numerous groundbreaking developments and it will be the same for the coming days. Therefore, this paper steps in to enlighten the scientific community with a systemic review of the published research papers within the past decade. The bibliographic data that is extracted from the papers are analyzed using an automated tool named Vosviewer with respect to some parameters. Substantial excerpts from the most influential papers are highlighted in this work. Furthermore, this paper points out the global research practice in this field. The paper also generates some intriguing questions and answers them in regards to the research topic. After reading this paper, future researchers will have a firm idea in the RL-based robotics and will be able to incorporate in their own research.

show abstract

“…Diese Datenmengen lassen sich mit dem Training auf realen Systemen -also ohne die Verwendung von Simulationen -nur schwer erzeugen. Limitierende Faktoren sind hierbei insbesondere die Voraussetzung einer großen Menge an parallelen Aufbauten, der hohe Zeitaufwand und der Verschleiß der Hardware (Irpan 2020;Hessel et al 2017;Kober et al 2013;Mahmood et al 2018). Aus diesem Grund sind hochqualitative Simulationen zum Training der Algorithmen erforderlich.…”

Section: Inhärente Technische Hürdenunclassified

“…Darüber hinaus steigert diese Art der Initialisierung die benötigten Datenmengen, da der Algorithmus alles von Grund auf lernen muss. Leider ist es insbesondere bei den häufig verwendeten modellfreien Algorithmen nicht möglich, effektiv Vorwissen einzubringen oder bereits gelerntes Wissen auf ähnliche Probleme zu übertragen, was diesem Problem entgegenwirken könnte (Polydoros und Nalpantidis 2017;Mahmood et al 2018;Schulman et al 2015).…”

Section: Inhärente Technische Hürdenunclassified

Automatische Programmierung von Produktionsmaschinen

Eiling

Huber

2020

Digitalisierung Souverän Gestalten

View full text Add to dashboard Cite

Zusammenfassung Heutige Methoden der Programmierung von Produktionsmaschinen erfordern großen manuellen Aufwand. Dies hat zur Konsequenz, dass der Einsatz heutiger Automatisierungslösungen nur bei großen Stückzahlen wirtschaftlich ist. Im Zuge der Massenpersonalisierung kommt es gleichzeitig zu immer höheren Anforderungen an die Flexibilität der Produktion. Damit kann der Produktionsstandort Deutschland nur mittels einer gesteigerten digitalen Souveränität über die eigenen Produktionsmaschinen durch eine aufwandsreduzierte, flexible Programmiermöglichkeit von Produktionsmaschinen gehalten werden. Zur Reduzierung des Programmieraufwands sind Methoden des Maschinellen Lernens geeignet, insbesondere das Teilgebiet des Reinforcement Learning (RL). Beides verspricht eine deutlich gesteigerte Produktivität. Im Folgenden werden die Möglichkeiten und die Hindernisse auf dem Weg zur RL-gestützten, flexiblen, autonom handelnden Produktionsmaschine analysiert. Besonders im Fokus stehen dabei Aspekte der Zuverlässigkeit von Systemen aus dem Feld der Künstlichen Intelligenz (KI). Ein zentraler Aspekt der Zuverlässigkeit ist die Erklärbarkeit der KI-Systeme. Diese Erklärbarkeit ist wiederum eine tragende Säule der digitalen Souveränität auf der Ebene der das System nutzenden Menschen.

show abstract

Setting up a Reinforcement Learning Task with a Real-World Robot

Cited by 76 publications

References 25 publications

Learning to Walk Via Deep Reinforcement Learning

Learning to Walk Via Deep Reinforcement Learning

A Systematic Review on Reinforcement Learning-Based Robotics Within the Last Decade

Automatische Programmierung von Produktionsmaschinen

Contact Info

Product

Resources

About