Outracing champion Gran Turismo drivers with deep reinforcement learning

Wurman, Peter R.; Barrett, Steven F.; Kawamoto, Kiyosumi; MacGlashan, James; Subramanian, K.A.; Walsh, Thomas J.; Capobianco, Roberto; Devlić, Alisa; Eckert, Franziska; Fuchs, Florian; Gilpin, Leilani H.; Khandelwal, Piyush; Kompella, Varun Raj; Lin, HaoChih; MacAlpine, Patrick; Oller, Declan; Seno, Takuma; Sherstan, Craig; Thomure, Michael D.; Aghabozorgi, Houmehr; Barrett, Leon; Douglas, Rory; Whitehead, Dion; Dürr, Peter; Stone, Peter; Spranger, Michael; Kitano, Hiroaki

doi:10.1038/s41586-021-04357-7

Cited by 195 publications

(81 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Liu et al (2021b) investigate simulated humanoid football from motor control to team cooperation. Wurman et al (2022) develop automobile racing agent, winning the world's best e-sports drivers.…”

Section: Gamesmentioning

confidence: 99%

“…Krishnan et al (2021) introduce a simulator for resource-constrained autonomous aerial robots. Wurman et al (2022) develop automobile racing agent in simulation, the PlayStation game Gran Turismo, to win the world's best e-sports drivers. Ibarz et al (2021) review how to train robots with deep RL and discuss outstanding challenges and strategies to mitigate them: 1) reliable and stable learning; 2) sample efficiency: 2.1) off-policy algorithms, 2.2) model-based algorithms, 2.3) input remapping for high-dimensional observations, and 2.4) offline training; 3) use of simulation: 3.1) better simulation, 3.2) domain randomization, and 3.3) domain adaptation; 4) side-stepping exploration challenges: 4.1) initialization, 4.2) data aggregation, 4.3) joint training, 4.4) demonstrations in model-based RL, 4.5) scripted policies, and 4.6) reward shaping; 5) generalization: 5.1) data diversity and 5.2) proper evaluation; 6) avoiding model exploita-tion; 7) robot operation at scale: 7.1) experiment design, 7.2) facilitating continuous operation, and 7.3) non-stationarity owing to environment changes; 8) asynchronous control: thinking and acting at the same time; 9) setting goals and specifying rewards; 10) multi-task learning and meta-learning; 11) safe learning: 11.1) designing safe action spaces, 11.2) smooth actions, 11.3) recognizing unsafe situations, 11.4) constraining learned policies, and 11.5) robustness to unseen observations; and 12) robot persistence: 12.1) self-persistence and 12.2) task persistence.…”

Section: Roboticsmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement Learning in Practice: Opportunities and Challenges

Li¹

2022

Preprint

View full text Add to dashboard Cite

This article is a gentle discussion about the field of reinforcement learning for real life, about opportunities and challenges, with perspectives and without technical details, touching a broad range of topics. The article is based on both historical and recent research papers, surveys, tutorials, talks, blogs, and books. Various groups of readers, like researchers, engineers, students, managers, investors, officers, and people wanting to know more about the field, may find the article interesting.In this article, we first give a brief introduction to reinforcement learning (RL), and its relationship with deep learning, machine learning and AI. Then we discuss opportunities of RL, in particular, applications in products and services, games, recommender systems, robotics, transportation, economics and finance, healthcare, education, combinatorial optimization, computer systems, and science and engineering. The we discuss challenges, in particular, 1) foundation, 2) representation, 3) reward, 4) model, simulation, planning, and benchmarks, 5) learning to learn a.k.a. meta-learning, 6) off-policy/offline learning, 7) software development and deployment, 8) business perspectives, and 9) more challenges. We conclude with a discussion, attempting to answer: "Why has RL not been widely adopted in practice yet?" and "When is RL helpful?".

show abstract

Section: Gamesmentioning

confidence: 99%

Section: Roboticsmentioning

confidence: 99%

Reinforcement Learning in Practice: Opportunities and Challenges

Li¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Classical reinforcement learning (RL) [1] has generated excellent results in different regions [2][3][4][5][6][7]. During the past decade, RL has been broadly applied to master Go [2], design chips [7], play the game for StarCraft and Gran Turismo [3,4], improve the nuclear fusion problem [5], and solve the problem of protein folding [6]. Despite the remarkable achievements, most RL techniques fail to balance the tradeoff between exploitation and exploration [8].…”

Section: Introductionmentioning

confidence: 99%

Unentangled quantum reinforcement learning agents in the OpenAI Gym

Hsiao¹,

Du²,

Chiang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Classical reinforcement learning (RL) has generated excellent results in different regions ; however, its sample inefficiency remains a critical issue. In this paper, we provide concrete numerical evidence that the sample efficiency (the speed of convergence) of quantum RL could be better than that of classical RL, and for achieving comparable learning performance, quantum RL could use much (at least one order of magnitude) fewer trainable parameters than classical RL. Specifically, we employ the popular benchmarking environments of RL in the OpenAI Gym, and show that our quantum RL agent converges faster than classical fully-connected neural networks (FCNs) in the tasks of CartPole and Acrobot under the same optimization process. We also successfully train the first quantum RL agent that can complete the task of LunarLander in the OpenAI Gym. Our quantum RL agent only requires a single-qubit-based variational quantum circuit without entangling gates, followed by a classical neural network (NN) to post-process the measurement output. Finally, we could accomplish the aforementioned tasks on the real IBM quantum machines. To the best of our knowledge, none of the earlier quantum RL agents could do that.

show abstract

“…Environments can be based on simulations. For example, popular RL applications with simulation-based environments include Atari video-games (Mnih et al, 2013), robotic tasks (Tunyasuvunakool et al, 2020) and autonomous driving (Sallab et al, 2017;Wurman et al, 2022).…”

Section: Introductionmentioning

confidence: 99%

A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning

Tec,

Duan,

Müller

2022

Preprint

View full text Add to dashboard Cite

Reinforcement Learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from supervised data. We contrast and compare RL with traditional sequential design, focusing on simulation-based Bayesian sequential design (BSD). Recently, there has been an increasing interest in RL techniques for healthcare applications. We introduce two related applications as motivating examples. In both applications, the sequential nature of the decisions is restricted to sequential stopping. Rather than a comprehensive survey, the focus of the discussion is on solutions using standard tools for these two relatively simple sequential stopping problems. Both problems are inspired by adaptive clinical trial design. We use examples to explain the terminology and mathematical background that underlie each framework and map one to the other. The implementations and results illustrate the many similarities between RL and BSD. The results motivate the discussion of the potential strengths and limitations of each approach.

show abstract

Outracing champion Gran Turismo drivers with deep reinforcement learning

Cited by 195 publications

References 15 publications

Reinforcement Learning in Practice: Opportunities and Challenges

Reinforcement Learning in Practice: Opportunities and Challenges

Unentangled quantum reinforcement learning agents in the OpenAI Gym

A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning

Contact Info

Product

Resources

About