Proceedings of the 23rd International Conference on Machine Learning - ICML '06 2006
DOI: 10.1145/1143844.1143936
|View full text |Cite
|
Sign up to set email alerts
|

Maximum margin planning

Abstract: Imitation learning of sequential, goaldirected behavior by standard supervised techniques is often difficult. We frame learning such behaviors as a maximum margin structured prediction problem over a space of policies. In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior. Further, we demonstrate a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
396
0
7

Year Published

2006
2006
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 492 publications
(420 citation statements)
references
References 18 publications
2
396
0
7
Order By: Relevance
“…Here, the robot can learn a C that explains the demonstrations [4], using tools like Inverse Optimal Control (IOC) [1,44,60]. However, extending these tools to higher dimensions is an open problem [44], and recent work focuses on learning costs that make the demonstrations locally optimal [32,38], or on restricting the space of trajectories to one in which optimization is tractable [30].…”
Section: Discussionmentioning
confidence: 99%
“…Here, the robot can learn a C that explains the demonstrations [4], using tools like Inverse Optimal Control (IOC) [1,44,60]. However, extending these tools to higher dimensions is an open problem [44], and recent work focuses on learning costs that make the demonstrations locally optimal [32,38], or on restricting the space of trajectories to one in which optimization is tractable [30].…”
Section: Discussionmentioning
confidence: 99%
“…In opposition, inverse optimal control [14] and inverse reinforcement learning [15] are approaches based on the idea that, in some situations, it can lead to better generalization to model aspects of the task that the demonstrator is trying to solve instead of modeling the particular solution in the demonstrated context. The capacity of inverse optimal control to achieve better generalization has been demonstrated in thee x p e r i m e n tp e r f o r m e db yA b b e e le ta l .…”
Section: Inverse Feedback Controlmentioning
confidence: 99%
“…In fact, a parse in this framework is obtained as the trajectory when an optimal policy is followed in an appropriately defined MDP. This idea is not completely new: The reverse connection was exploited by Ratliff et al (2006) who derived an algorithm for inferring rewards using the large margin approach of Taskar et al (2005). Maes et al (2007) have used reinforcement learning for solving the structured prediction problem of sequence labeling.…”
Section: Introductionmentioning
confidence: 99%
“…The fourth algorithm considered is the Max-Margin Planning method of Ratliff et al (2006) which has already been mentioned previously. This algorithm uses the same criterion as Taskar et al (2004), but instead of using the so-called structured SMO method used by Taskar et al (2004), following the suggestion of Ratliff et al (2006) we implement the optimizer using a subgradient method.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation