The Principle of Maximum Causal Entropy for Estimating Interacting Processes

Ziebart, Brian D.; Bagnell, J. Andrew; Dey, Anind K.

doi:10.1109/tit.2012.2234824

Cited by 113 publications

(140 citation statements)

References 47 publications

(53 reference statements)

Supporting

Mentioning

119

Contrasting

Order By: Relevance

“…It is often useful to consider the maximum entropy principle in its regularized form [Ziebart et al, 2013] , that is, instead of finding a maximum entropy distribution we want to find a distribution with the minimal KL divergence relative to a "prior" distribution p 0 (τ ) while matching the features of the demonstrator, that is,…”

Section: Information Theoretic Understanding Of Imitation Learning Almentioning

confidence: 99%

“…Alternate prior distributions can be easily taken into account by simply adding a "feature" that is log p 0 (τ ) either with a weight fixed to 1.0 or allowed to adapt and learn. The maximum causal entropy distribution [Ziebart et al, 2013] can be understood to assume to remove the effects of stochastic dynamics as well. For learning tasks involving physical systems, it is often desirable to consider alternate p 0 (τ ), particularly by exploiting information in the system dynamics.…”

Section: Interpretation Of Irl With the Maximum Entropy Principlementioning

confidence: 99%

See 1 more Smart Citation

An Algorithmic Perspective on Imitation Learning

et al. 2018

Self Cite

View full text Add to dashboard Cite

As robots and other intelligent agents move from simple environments and problems to more complex, unstructured settings, manually programming their behavior has become increasingly challenging and expensive. Often, it is easier for a teacher to demonstrate a desired behavior rather than attempt to manually engineer it. This process of learning from demonstrations, and the study of algorithms to do so, is called imitation learning. This work provides an introduction to imitation learning. It covers the underlying assumptions, approaches, and how they relate; the rich set of algorithms developed to tackle the problem; and advice on effective tools and implementation.We intend this paper to serve two audiences. First, we want to familiarize machine learning experts with the challenges of imitation learning, particularly those arising in robotics, and the interesting theoretical and practical distinctions between it and more familiar frameworks like statistical supervised learning theory and reinforcement learning. Second, we want to give roboticists and experts in applied artificial intelligence a broader appreciation for the frameworks and tools available for imitation learning.We organize our work by dividing imitation learning into directly replicating desired behavior (sometimes called behavioral cloning [Bain and Sammut, 1996]) and learning the hidden objectives of the desired behavior from demonstrations (called inverse optimal control [Kalman, 1964] or inverse reinforcement learning [Russell, 1998]). In addition to method analysis, we discuss the design decisions a practitioner must make when selecting an imitation learning approach. Moreover, application examples-such as robots that play table tennis [Kober and Peters, 2009] and programs that play the game of Go [Silver et al., 2016]-illustrate the properties and motivations behind different forms of imitation learning. We conclude by presenting a set of open questions and point towards possible future research directions.

show abstract

Section: Information Theoretic Understanding Of Imitation Learning Almentioning

confidence: 99%

Section: Interpretation Of Irl With the Maximum Entropy Principlementioning

confidence: 99%

An Algorithmic Perspective on Imitation Learning

et al. 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…[4] develops a structural SOC-based model for estimation of mobile phone users' preferences using their observed data daily consumption. On the side of Inverse Reinforcement Learning, our framework is rooted in The Maximum Entropy IRL (MaxEnt-IRL) [5,6] method. Other relevant references to the Maximum Entropy IRL are Refs.…”

Section: Related Workmentioning

confidence: 99%

“…Given an initial guess for the optimal parameter θ (0) k , we can also consider a regularized version of the negative log-likelihood: 6 A more complex case of co-dependencies between rewards for individual customers can be considered, but we will not pursue this approach here. Note that this specification formally enables calibration at the level of an individual customer, in which case N would be equal to the number of consumption cycles observed for this user.…”

Section: Probabilities Of T -Steps Pathsmentioning

confidence: 99%

Inverse Reinforcement Learning for Marketing

Halperin

2017

SSRN Journal

View full text Add to dashboard Cite

Learning customer preferences from an observed behaviour is an important topic in the marketing literature. Structural models typically model forward-looking customers or firms as utility-maximizing agents whose utility is estimated using methods of Stochastic Optimal Control. We suggest an alternative approach to study dynamic consumer demand, based on Inverse Reinforcement Learning (IRL). We develop a version of the Maximum Entropy IRL that leads to a highly tractable model formulation that amounts to low-dimensional convex optimization in the search for optimal model parameters. Using simulations of consumer demand, we show that observational noise for identical customers can be easily confused with an apparent consumer heterogeneity.

show abstract

“…Next, we explain how the driver's objective inferred by inverse optimal control can be used to predict her behavior in new situations II-C. Maximum causal entropy inverse optimal control [19] is presented in II-D as an approach to account for suboptimal driver behavior. Following, we report the setting of the driving study, we conducted for evaluation in III.…”

Section: Contributionsmentioning

confidence: 99%

Predicting lane keeping behavior of visually distracted drivers using inverse suboptimal control

Schmitt

Bieg

Manstetten

et al. 2016

2016 IEEE Intelligent Vehicles Symposium (IV)

View full text Add to dashboard Cite

Driver distraction strongly contributes to crashrisk. Therefore, assistance systems that warn the driver if her distraction poses a hazard to road safety, promise a great safety benefit. Current approaches either seek to detect critical situations using environmental sensors or estimate a driver's attention state solely from her behavior. However, this neglects that driving situation, driver deficiencies and compensation strategies altogether determine the risk of an accident. This work proposes to use inverse suboptimal control to predict these aspects in visually distracted lane keeping. In contrast to other approaches, this allows a situation-dependent assessment of the risk posed by distraction. Real traffic data of seven drivers are used for evaluation of the predictive power of our approach. For comparison, a baseline was built using established behavior models. In the evaluation our method achieves a consistently lower prediction error over speed and track-topology variations. Additionally, our approach generalizes better to driving speeds unseen in training phase.

show abstract

The Principle of Maximum Causal Entropy for Estimating Interacting Processes

Cited by 113 publications

References 47 publications

An Algorithmic Perspective on Imitation Learning

An Algorithmic Perspective on Imitation Learning

Inverse Reinforcement Learning for Marketing

Predicting lane keeping behavior of visually distracted drivers using inverse suboptimal control

Contact Info

Product

Resources

About