Abstract:In this paper, we investigate the structural similarities within a finite Markov decision process (MDP). We view a finite MDP as a heterogeneous directed bipartite graph and propose novel measures for state similarity and action similarity in a mutual reinforcement manner. We prove that the state similarity is a metric and the action similarity is a pseudometric. We also establish the connection between the proposed similarity measures and the optimal values of the MDP. Extensive experiments show that the prop… Show more
“…For MDPs there is much work on measures relating to e.g. homomorphism and abstraction [36,37] and work is starting to emerge to gain more insight in the logical side [31] but their interaction needs study.…”
Section: Discussionmentioning
confidence: 99%
“…The Markov property dictates that given the present, the future is independent of the past. To scale to more complex problems, one can exploit structure in the space of state(-action) spaces, or policies or value functions, to utilize abstractions and approximations, for example as value function approximation, state space abstractions [37], and hierarchical decompositions, cf. [36].…”
Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in recent years interest has grown into combinations of reinforcement learning and temporal logic, that is, combinations of flexible behavior learning methods with robust verification and guarantees. In this paper we describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions. In particular, we provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms, and applications in regular, but non-Markovian, grid worlds.
“…For MDPs there is much work on measures relating to e.g. homomorphism and abstraction [36,37] and work is starting to emerge to gain more insight in the logical side [31] but their interaction needs study.…”
Section: Discussionmentioning
confidence: 99%
“…The Markov property dictates that given the present, the future is independent of the past. To scale to more complex problems, one can exploit structure in the space of state(-action) spaces, or policies or value functions, to utilize abstractions and approximations, for example as value function approximation, state space abstractions [37], and hierarchical decompositions, cf. [36].…”
Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in recent years interest has grown into combinations of reinforcement learning and temporal logic, that is, combinations of flexible behavior learning methods with robust verification and guarantees. In this paper we describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions. In particular, we provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms, and applications in regular, but non-Markovian, grid worlds.
“…Alternative means to quantify the similarity is to use a full specification of MDPs (Song et al, 2016;Wang et al, 2019) or environmental dynamics Yu et al (2019). In contrast, the proposed MULTI-POLAR allows the knowledge transfer only through the policies acquired from source environment instances, which is beneficial when source and target environments are not always connected to exchange information about their environmental dynamics and training samples.…”
Transfer reinforcement learning (RL) aims at improving learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks. However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments. In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under unknown diverse dynamics is available for learning a target task efficiently. To address this problem, the proposed approach, MULTI-source POLicy AggRegation (MULTIPOLAR), comprises two key techniques. We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance. Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy's expressiveness even when some of the source policies perform poorly. We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces.
“…Also, while the above methods can more effectively leverage task similarity, there are still a number of limitations and open questions. Though notions of neurogenesis, compositionality, and reconfigurability implicitly rely on task similarity, it is not clear whether and how more explicit measures and representations for task similarity 205 could provide further improvements.…”
Biological organisms learn from interactions with their environment throughout their lifetime. For artificial systems to successfully act and adapt in the real world, it is desirable to similarly be able to learn on a continual basis. This challenge is known as lifelong learning, and remains to a large extent unsolved. In this perspective article, we identify a set of key capabilities that artificial systems will need to achieve lifelong learning. We describe a number of biological mechanisms, both neuronal and non-neuronal, that help explain how organisms solve these challenges, and present examples of biologically inspired models and biologically plausible mechanisms that have been applied to artificial intelligence systems in the quest towards development of lifelong learning machines. We discuss opportunities to further our understanding and advance the state of the art in lifelong learning, aiming to bridge the gap between natural and artificial intelligence.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.