Scott Niekum scite author profile

Robots exhibit flexible behavior largely in proportion to their degree of knowledge about the world. Such knowledge is often meticulously hand-coded for a narrow class of tasks, limiting the scope of possible robot competencies. Thus, the primary limiting factor of robot capabilities is often not the physical attributes of the robot, but the limited time and skill of expert programmers. One way to deal with the vast number of situations and environments that robots face outside the laboratory is to provide users with simple methods for programming robots that do not require the skill of an expert.For this reason, learning from demonstration (LfD) has become a popular alternative to traditional robot programming methods, aiming to provide a natural mechanism for quickly teaching robots. By simply showing a robot how to perform a task, users can easily demonstrate new tasks as needed, without any special knowledge about the robot. Unfortunately, LfD often yields little knowledge about the world, and thus lacks robust generalization capabilities, especially for complex, multi-step tasks.We present a series of algorithms that draw from recent advances in Bayesian nonparametric statistics and control theory to automatically detect and leverage repeated structure at multiple levels of abstraction in demonstration data. The discovery of repeated structure provides critical insights into task invariants, features of importance, high-level task structure, and appropriate skills for the task. This culminates in the discovery of a finite-state representation of the task, comprised of grounded skills that are flexible and reusable, providing robust generalization and transfer in complex, multistep robotic tasks. These algorithms are tested and evaluated using a PR2 mobile manipulator, showing success on several complex real-world tasks, such as furniture assembly.

show abstract

Learning and generalization of complex tasks from unstructured demonstrations

Niekum

et al. 2012

View full text Add to dashboard Cite

We present a novel method for segmenting demonstrations, recognizing repeated skills, and generalizing complex tasks from unstructured demonstrations. This method combines many of the advantages of recent automatic segmentation methods for learning from demonstration into a single principled, integrated framework. Specifically, we use the Beta Process Autoregressive Hidden Markov Model and Dynamic Movement Primitives to learn and generalize a multi-step task on the PR2 mobile manipulator and to demonstrate the potential of our framework to learn a large library of skills over time.

show abstract

Safe Reinforcement Learning via Shielding

Alshiekh

Bloem

Ehlers

et al. 2018

AAAI

314

View full text Add to dashboard Cite

Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. We discuss which requirements a shield must meet to preserve the convergence guarantees of the learner. Finally, we demonstrate the versatility of our approach on several challenging reinforcement learning scenarios.

show abstract

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

Brown

Niekum

2019

AAAI

View full text Add to dashboard Cite

Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization. However, despite much recent interest in IRL, little work has been done to understand the minimum set of demonstrations needed to teach a specific sequential decisionmaking task. We formalize the problem of finding maximally informative demonstrations for IRL as a machine teaching problem where the goal is to find the minimum number of demonstrations needed to specify the reward equivalence class of the demonstrator. We extend previous work on algorithmic teaching for sequential decision-making tasks by showing a reduction to the set cover problem which enables an efficient approximation algorithm for determining the set of maximallyinformative demonstrations. We apply our proposed machine teaching algorithm to two novel applications: providing a lower bound on the number of queries needed to learn a policy using active IRL and developing a novel IRL algorithm that can learn more efficiently from informative demonstrations than a standard IRL approach.

show abstract

Incremental Semantically Grounded Learning from Demonstration

Niekum¹,

Chitta²,

Barto³

et al. 2013

View full text Add to dashboard Cite

Abstract-Much recent work in robot learning from demonstration has focused on automatically segmenting continuous task demonstrations into simpler, reusable primitives. However, strong assumptions are often made about how these primitives can be sequenced, limiting the potential for data reuse. We introduce a novel method for discovering semantically grounded primitives and incrementally building and improving a finite-state representation of a task in which various contingencies can arise. Specifically, a Beta Process Autoregressive Hidden Markov Model is used to automatically segment demonstrations into motion categories, which are then further subdivided into semantically grounded states in a finite-state automaton. During replay of the task, a data-driven approach is used to collect additional data where they are most needed through interactive corrections, which are then used to improve the finite-state automaton. Together, this allows for intelligent sequencing of primitives to create novel, adaptive behavior that can be incrementally improved as needed. We demonstrate the utility of this technique on a furniture assembly task using the PR2 mobile manipulator.

show abstract

Using Natural Language for Reward Shaping in Reinforcement Learning

Goyal

Niekum

Mooney

2019

View full text Add to dashboard Cite

Recent reinforcement learning (RL) approaches have shown strong performance in complex domains such as Atari games, but are often highly sample inefficient. A common approach to reduce interaction time with the environment is to use reward shaping, which involves carefully designing reward functions that provide the agent intermediate rewards for progress towards the goal. However, designing appropriate shaping rewards is known to be difficult as well as time-consuming. In this work, we address this problem by using natural language instructions to perform reward shaping. We propose the LanguagE-Action Reward Network (LEARN), a framework that maps free-form natural language instructions to intermediate rewards based on actions taken by the agent. These intermediate language-based rewards can seamlessly be integrated into any standard reinforcement learning algorithm. We experiment with Montezuma's Revenge from the Atari Learning Environment, a popular benchmark in RL. Our experiments on a diverse set of 15 tasks demonstrate that, for the same number of interactions with the environment, language-based rewards lead to successful completion of the task 60% more often on average, compared to learning without language.

show abstract

Active articulation model estimation through interactive perception

Hausman

Niekum

Osentoski

et al. 2015

View full text Add to dashboard Cite

Active Reward Learning from Critiques

Cui

Niekum

2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Scott Niekum

Learning grounded finite-state representations from unstructured demonstrations

Learning and generalization of complex tasks from unstructured demonstrations

Safe Reinforcement Learning via Shielding

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

Incremental Semantically Grounded Learning from Demonstration

Using Natural Language for Reward Shaping in Reinforcement Learning

Active articulation model estimation through interactive perception

Active Reward Learning from Critiques

Contact Info

Product

Resources

About