Dynamic preferences in multi-criteria reinforcement learning

Natarajan, Sriraam; Tadepalli, Prasad

doi:10.1145/1102351.1102427

Cited by 107 publications

(99 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One way to deal with this is to reduce the state space by eliminating states that are unlikely or impossible to occur, if there are any [3]. Nataraj et al [11] deal with learning in the situations where relative weights of policies change over time. Shelton provides a means to balance the incomparable rewards received from multiple sources [19].…”

Section: Multi-goal Q-learningmentioning

confidence: 99%

Using Reinforcement Learning for Multi-policy Optimization in Decentralized Autonomic Systems – An Experimental Evaluation

Dusparić

Cahill

2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Large-scale autonomic systems are required to self-optimize with respect to high-level policies, that can differ in terms of their priority, as well as their spatial and temporal scope. Decentralized multiagent systems represent one approach to implementing the required selfoptimization capabilities. However, the presence of multiple heterogeneous policies leads to heterogeneity of the agents that implement them. In this paper we evaluate the use of Reinforcement Learning techniques to support the self-optimization of heterogeneous agents towards multiple policies in decentralized systems. We evaluate these techniques in an Urban Traffic Control simulation and compare two approaches to supporting multiple policies. Our results suggest that approaches based on W-learning, which learn separately for each policy and then select between nominated actions based on current action importance, perform better than combining policies into a single learning process over a single state space. The results also indicate that explicitly supporting multiple policies simultaneously can improve waiting times over policies dedicated to optimizing for a single vehicle type.

show abstract

Section: Multi-goal Q-learningmentioning

confidence: 99%

Using Reinforcement Learning for Multi-policy Optimization in Decentralized Autonomic Systems – An Experimental Evaluation

Dusparić

Cahill

2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Natarajan and Tadepalli (2005) show that the efficiency of MOQL can be improved by sharing information between different weight settings. A hot topic in multiple-policy MORL is how to design the weight settings and share information among the different scalarized RL problems.…”

Section: Discussionmentioning

confidence: 99%

“…a weighted sum) of rewards associated to all objectives. Several multiple-policy MORL algorithms have been proposed (Natarajan and Tadepalli, 2005;Tesauro et al, 2007;Barrett and Narayanan, 2008;Lizotte et al, 2012) using the weighted sum of the objectives (with several weight settings) as scalar reward, which is optimized using standard reinforcement learning algorithms. The differences between the above algorithms are how they share the information between different weight settings and which weight settings they choose to optimize.…”

Section: Baseline Algorithmmentioning

confidence: 99%

Hypervolume indicator and dominance reward based multi-objective Monte-Carlo Tree Search

Wang

Sebag

2013

Mach Learn

View full text Add to dashboard Cite

Concerned with multi-objective reinforcement learning (MORL), this paper presents MOM-CTS, an extension of Monte-Carlo Tree Search to multi-objective sequential decision making, embedding two decision rules respectively based on the hypervolume indicator and the Pareto dominance reward. The MOMCTS approaches are firstly compared with the MORL state of the art on two artificial problems, the two-objective Deep Sea Treasure problem and the three-objective Resource Gathering problem. The scalability of MOM-CTS is also examined in the context of the NP-hard grid scheduling problem, showing that the MOMCTS performance matches the (non-RL based) state of the art albeit with a higher computational cost.

show abstract

“…For the multi-agent case, we use vector-based reinforcement learning [8], [9]. Each agent is assumed to be a component and the central controller picks actions according to a weighted sum of the individual rewards.…”

Section: Average-reward Rlmentioning

confidence: 99%