2017
DOI: 10.1609/aaai.v31i1.11065
|View full text |Cite
|
Sign up to set email alerts
|

Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes

Abstract: An intriguing application of transfer learning emerges when tasks arise with similar, but not identical, dynamics. Hidden Parameter Markov Decision Processes (HiP-MDP) embed these tasks into a low-dimensional space; given the embedding parameters one can identify the MDP for a particular task. However, the original formulation of HiP-MDP had a critical flaw: the embedding uncertainty was modeled independently of the agent's state uncertainty, requiring an arduous training procedure. In this work, we apply a Ga… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(5 citation statements)
references
References 5 publications
0
5
0
Order By: Relevance
“…We created a simulator for multiple heterogeneous patient groups by perturbing the internal hidden parameters of the system following Killian, Konidaris, and Doshi-Velez (2017). We learn an optimal decision policy for 3 different groups, and then use the known optimal policies as constraints to learn in a new group which may or may not be similar to a group with a known policy.…”
Section: Environments and Constraintsmentioning
confidence: 99%
“…We created a simulator for multiple heterogeneous patient groups by perturbing the internal hidden parameters of the system following Killian, Konidaris, and Doshi-Velez (2017). We learn an optimal decision policy for 3 different groups, and then use the known optimal policies as constraints to learn in a new group which may or may not be similar to a group with a known policy.…”
Section: Environments and Constraintsmentioning
confidence: 99%
“…Option learning [Yang et al, 2020b], policy function aggregation [Barekatain et al, 2019], policy shaping [Plisnier et al, 2019], and bayesian inference based mixture density network [Gimelfarb et al, 2020] are different methods proposed to fully utilize multiple source policies. Another stream of research assumes the existence of hidden parameters that define transition dynamics of environment [Killian et al, 2017;Perez et al, 2020;Yang et al, 2020a]. These studies exploit the advantage of hidden parameters that simplify the estimation of dynamics differences, hence facilitate transfer learning.…”
Section: Related Workmentioning
confidence: 99%
“…Some work directly addresses dynamics transfer where the transition function changes, but states, actions, and rewards are identical. Some work (Killian et al 2017;Doshi-Velez and Konidaris 2016;Yao et al 2018) parameterizes the transition function of an MDP and learning a conditional policy that can transfer between such tasks. Other work generates a curriculum to learn generalizable policies that can adapt to MDPs with differing dynamics (Mysore, Platt, and Saenko 2019).…”
Section: Transfer In Reinforcement Learningmentioning
confidence: 99%