Proceedings of the 22nd International Conference on Machine Learning - ICML '05 2005
DOI: 10.1145/1102351.1102427
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic preferences in multi-criteria reinforcement learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
94
0

Year Published

2009
2009
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 107 publications
(99 citation statements)
references
References 12 publications
0
94
0
Order By: Relevance
“…One way to deal with this is to reduce the state space by eliminating states that are unlikely or impossible to occur, if there are any [3]. Nataraj et al [11] deal with learning in the situations where relative weights of policies change over time. Shelton provides a means to balance the incomparable rewards received from multiple sources [19].…”
Section: Multi-goal Q-learningmentioning
confidence: 99%
“…One way to deal with this is to reduce the state space by eliminating states that are unlikely or impossible to occur, if there are any [3]. Nataraj et al [11] deal with learning in the situations where relative weights of policies change over time. Shelton provides a means to balance the incomparable rewards received from multiple sources [19].…”
Section: Multi-goal Q-learningmentioning
confidence: 99%
“…Natarajan and Tadepalli (2005) show that the efficiency of MOQL can be improved by sharing information between different weight settings. A hot topic in multiple-policy MORL is how to design the weight settings and share information among the different scalarized RL problems.…”
Section: Discussionmentioning
confidence: 99%
“…a weighted sum) of rewards associated to all objectives. Several multiple-policy MORL algorithms have been proposed (Natarajan and Tadepalli, 2005;Tesauro et al, 2007;Barrett and Narayanan, 2008;Lizotte et al, 2012) using the weighted sum of the objectives (with several weight settings) as scalar reward, which is optimized using standard reinforcement learning algorithms. The differences between the above algorithms are how they share the information between different weight settings and which weight settings they choose to optimize.…”
Section: Baseline Algorithmmentioning
confidence: 99%
“…For the multi-agent case, we use vector-based reinforcement learning [8], [9]. Each agent is assumed to be a component and the central controller picks actions according to a weighted sum of the individual rewards.…”
Section: Average-reward Rlmentioning
confidence: 99%