Learning from Demonstration in the Wild

Behbahani, Feryal; Shiarlis, Kyriacos; Chen, Xi; Kurin, Vitaly; Kasewa, Sudhanshu; Ciprian, Stirbu,; Gomes, João Paulo; Paul, Supratik; Oliehoek, Frans A.; Messias, João Carlos Caselli; Whiteson, Shimon

doi:10.48550/arxiv.1811.03516

Cited by 3 publications

(5 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…in online video repositories such as Youtube), solving problems in existing textbooks, or solving existing machine learning benchmarks in language, logic, reinforcement learning, etc. There is a long history of fruitful research in imitation learning and learning via observation that demonstrates the benefits of exploiting such data [37,13,162,7,142,36,182,129,116,1]. AI-GAs too could benefit from this treasure trove of information.…”

Section: Discussionmentioning

confidence: 99%

AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

Clune¹

2019

Preprint

View full text Add to dashboard Cite

Perhaps the most ambitious scientific quest in human history is the creation of general artificial intelligence, which roughly means AI that is as smart or smarter than humans. The dominant approach in the machine learning community is to attempt to discover each of the pieces that might be required for intelligence, with the implicit assumption that at some point in the future some group will complete the Herculean task of figuring out how to combine all of those pieces into an extremely complex machine. I call this the "manual AI approach." This paper describes another exciting path that ultimately may be more successful at producing general AI. It is based on the clear trend from the history of machine learning that hand-designed solutions eventually are replaced by more effective, learned solutions. The idea is to create an AI-generating algorithm (AI-GA), which itself automatically learns how to produce general AI. Three Pillars are essential for the approach: (1) meta-learning architectures, (2) meta-learning the learning algorithms themselves, and (3) generating effective learning environments. While work has begun on the first two pillars, little has been done on the third. Here I argue that either the manual or AI-GA approach could be the first to lead to general AI, and that both are worthwhile scientific endeavors irrespective of which is the fastest path. Because both approaches are roughly equally promising, and because the machine learning community is mostly committed to the engineered AI approach currently, I argue that our community should shift a substantial amount of its research investment to the AI-GA approach. To encourage such research, I describe promising work in each of the Three Pillars. I also discuss the safety and ethical considerations unique to the AI-GA approach. Because it it may be the fastest path to general AI and because it is inherently scientifically interesting to understand the conditions in which a simple algorithm can produce general AI (as happened on Earth where Darwinian evolution produced human intelligence), I argue that the pursuit of AI-GAs should be considered a new grand challenge of computer science research.1 Two approaches to producing general AI: the manual approach vs. AI-generating algorithmsArguably the most ambitious scientific quest in human history is the creation of general artificial intelligence, which roughly means AI as smart or smarter than humans 1 . The creation of general AI would transform every aspect of society and economic sector. It would also catalyze scientific discovery, leading to unpredictable advances, including further advances in AI itself (the potential ethical consequences of which I discuss in Section 4).1 This paper will not attempt to define general intelligence aside from saying it roughly means AI as smart or smarter than humans. Nor does this paper engage in the debate about to what extent such a thing exists. Such terrain is well-trodden without resolution, and is not the focus of this paper. Knowing roughly what most pe...

show abstract

Section: Discussionmentioning

confidence: 99%

AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

Clune¹

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Previously, Ziebart et al [6] and Ross et al [5] proposed general methods in Inverse Reinforcement Learning and Interactive Learning from Demonstration, with an empirical study on a driving game. More recently, Kuefler et al [16] and Behbahani et al [17] learn an end-to-end policy in a GAIL [13]-like manner. Codevilla et al [18] and Liang et al [19] share similar hierarchical perspective as us, but still control policies are completely neural.…”

Section: Related Work Classical Autonomous Driving Systemmentioning

confidence: 99%

Learning a Decision Module by Imitating Driver's Control Behaviors

Huang,

Xie,

Sun

et al. 2019

Preprint

View full text Add to dashboard Cite

Classical autonomous driving systems are modularized as a pipeline of perception, decision, planning, and control. The driving decision plays a central role in processing the observation from the perception as well as directing the execution of downstream planning and control modules. Commonly the decision module is designed to be rule-based and is difficult to learn from data. Recently end-to-end neural control policy has been proposed to replace this pipeline, given its generalization ability. However, it remains challenging to enforce physical or logical constraints on the decision to ensure driving safety and stability. In this work, we propose a hybrid framework for learning a decision module, which is agnostic to the mechanisms of perception, planning, and control modules. By imitating the low-level control behavior, it learns the high-level driving decisions while bypasses the ambiguous annotation of high-level driving decisions. We demonstrate that the simulation agents with a learned decision module can be generalized to various complex driving scenarios where the rule-based approach fails. Furthermore, it can generate driving behaviors that are smoother and safer than end-to-end neural policies ‡ .

show abstract

“…Imitation learning is also known as learning from demonstrations or apprenticeship learning, whose goal is to learn how to perform a task directly from expert demonstrations, without any access to the reward signal r (s, a). Recent main lines of researches within imitation learning are behavioural cloning (BC) [6,39], which performs supervised learning from observations to actions when given a number of expert demonstrations; inverse reinforcement learning (IRL) [1], where a reward function is estimated that explains the demonstrations as (near) optimal behavior; and generative adversarial imitation learning (GAIL) [3,4,17,43], which is inspired by the generative adversarial networks (GAN) [15]. Let T E denote the trajectories generated by the behind expert policy π E , each of which consists of a sequence of state-action pairs.…”

Section: Generative Adversarial Imitation Learningmentioning

confidence: 99%

“…Then, we reuse the same pre-trained animal models as the default policies for the wounded animals in all experiments. The learning curves of IGASIL versus MADDPG and DDPG are plotted in Figure (3) under five random seeds (1,2,3,4,5). To show a smoother learning procedure, the reward value is averaged every 1000 episodes.…”

Section: Cooperative Endangered Wildlife Rescuementioning

confidence: 99%

“…To clearly see the influence of the on-policy and off-policy only, we initialize M i E for each agent i with the same 32 demonstrations (demonstrated by the pre-trained DDPG agent in Section 5.1.1, which will cooperatively catch "animal c", resulting an average return +5). We perform learning under 5 random seeds (1,2,3,4,5). The imitation learning results are shown in Figure (6).…”

Section: Sample Efficiency Of Igasilmentioning

confidence: 99%

See 1 more Smart Citation

Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems

Hao¹,

Wang²,

Hao³

et al. 2019

Preprint

View full text Add to dashboard Cite

Many tasks in practice require the collaboration of multiple agents through reinforcement learning. In general, cooperative multiagent reinforcement learning algorithms can be classified into two paradigms: Joint Action Learners (JALs) and Independent Learners (ILs). In many practical applications, agents are unable to observe other agents' actions and rewards, making JALs inapplicable. In this work, we focus on independent learning paradigm in which each agent makes decisions based on its local observations only. However, learning is challenging in independent settings due to the local viewpoints of all agents, which perceive the world as a non-stationary environment due to the concurrently exploring teammates. In this paper, we propose a novel framework called Independent Generative Adversarial Self-Imitation Learning (IGASIL) to address the coordination problems in fully cooperative multiagent environments. To the best of our knowledge, we are the first to combine self-imitation learning with generative adversarial imitation learning (GAIL) and apply it to cooperative multiagent systems. Besides, we put forward a Sub-Curriculum Experience Replay mechanism to pick out the past beneficial experiences as much as possible and accelerate the self-imitation learning process. Evaluations conducted in the testbed of StarCraft unit micromanagement and a commonly adopted benchmark show that our IGASIL produces state-of-the-art results and even outperforms JALs in terms of both convergence speed and final performance.

show abstract

Learning from Demonstration in the Wild

Cited by 3 publications

References 0 publications

AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

Learning a Decision Module by Imitating Driver's Control Behaviors

Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems

Contact Info

Product

Resources

About