Rati Devidze scite author profile

Rati Devidze

5Publications

42Citation Statements Received

86Citation Statements Given

How they've been cited

How they cite others

Affiliations

Max Planck Institute for Software Systems

Publications

Order By: Most citations

Interactive Teaching Algorithms for Inverse Reinforcement Learning

Kamalaruban¹,

Devidze

Cevher³

et al. 2019

View full text Add to dashboard Cite

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.

show abstract

Interactive Teaching Algorithms for Inverse Reinforcement Learning

Kamalaruban¹,

Devidze²,

Cevher³

et al. 2019

Preprint

View full text Add to dashboard Cite

Learning to Collaborate in Markov Decision Processes

Radanovic¹,

Devidze²,

Parkes³

et al. 2019

Preprint

View full text Add to dashboard Cite

We consider a two-agent MDP framework where agents repeatedly solve a task in a collaborative setting. We study the problem of designing a learning algorithm for the first agent (A 1 ) that facilitates successful collaboration even in cases when the second agent (A 2 ) is adapting its policy in an unknown way. The key challenge in our setting is that the first agent faces non-stationarity in rewards and transitions because of the adaptive behavior of the second agent.We design novel online learning algorithms for agent A 1 whose regret decays as O T max{1− 3 7 •α, 1 4 } , for T learning episodes, provided that the magnitude in the change in agent A 2 's policy between any two consecutive episodes is upper bounded by O(T −α ). Here, the parameter α is assumed to be strictly greater than 0, and we show that this assumption is necessary provided that the learning parity with noise problem is computationally hard. We show that sublinear regret of agent A 1 further implies nearoptimality of the agents' joint return for MDPs that manifest the properties of a smooth game.

show abstract

Understanding the Power and Limitations of Teaching with Imperfect Knowledge

Devidze

Mansouri

Haug

et al. 2020

View full text Add to dashboard Cite

Machine teaching studies the interaction between a teacher and a student/learner where the teacher selects training examples for the learner to learn a specific task. The typical assumption is that the teacher has perfect knowledge of the task---this knowledge comprises knowing the desired learning target, having the exact task representation used by the learner, and knowing the parameters capturing the learning dynamics of the learner. Inspired by real-world applications of machine teaching in education, we consider the setting where teacher's knowledge is limited and noisy, and the key research question we study is the following: When does a teacher succeed or fail in effectively teaching a learner using its imperfect knowledge? We answer this question by showing connections to how imperfect knowledge affects the teacher's solution of the corresponding machine teaching problem when constructing optimal teaching sets. Our results have important implications for designing robust teaching algorithms for real-world applications.

show abstract

Curriculum Design for Teaching via Demonstrations: Theory and Applications

Yengera¹,

Devidze²,

Kamalaruban³

et al. 2021

Preprint

View full text Add to dashboard Cite

We consider the problem of teaching via demonstrations in sequential decisionmaking settings. In particular, we study how to design a personalized curriculum over demonstrations to speed up the learner's convergence. We provide a unified curriculum strategy for two popular learner models: Maximum Causal Entropy Inverse Reinforcement Learning (MaxEnt-IRL) and Cross-Entropy Behavioral Cloning (CrossEnt-BC). Our unified strategy induces a ranking over demonstrations based on a notion of difficulty scores computed w.r.t. the teacher's optimal policy and the learner's current policy. Compared to the state of the art, our strategy doesn't require access to the learner's internal dynamics and still enjoys similar convergence guarantees under mild technical conditions. Furthermore, we adapt our curriculum strategy to teach a learner using domain knowledge in the form of task-specific difficulty scores when the teacher's optimal policy is unknown. Experiments on a car driving simulator environment and shortest path problems in a grid-world environment demonstrate the effectiveness of our proposed curriculum strategy.Preprint. Under review.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rati Devidze

Interactive Teaching Algorithms for Inverse Reinforcement Learning

Interactive Teaching Algorithms for Inverse Reinforcement Learning

Learning to Collaborate in Markov Decision Processes

Understanding the Power and Limitations of Teaching with Imperfect Knowledge

Curriculum Design for Teaching via Demonstrations: Theory and Applications

Contact Info

Product

Resources

About