2018
DOI: 10.15439/2018f231
|View full text |Cite
|
Sign up to set email alerts
|

Modular Multi-Objective Deep Reinforcement Learning with Decision Values

Abstract: In this work we present a method for using Deep Q-Networks (DQNs) in multi-objective environments. Deep Q-Networks provide remarkable performance in single objective problems learning from high-level visual state representations. However, in many scenarios (e.g in robotics, games), the agent needs to pursue multiple objectives simultaneously. We propose an architecture in which separate DQNs are used to control the agent's behaviour with respect to particular objectives. In this architecture we introduce decis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
20
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(22 citation statements)
references
References 8 publications
(9 reference statements)
0
20
0
1
Order By: Relevance
“…Typical multi-objective optimization (MOO) studies the problem of optimizing a set of possibly conflicting objectives. One main strategy to solve this problem is the scalarization approach (Ward & Lee, 2001;Nguyen, 2018;Tajmajer, 2018;Vamplew et al, 2017;Van Moffaert et al, 2013), where one or several single-objective optimization problems are solved. Two major disadvantages of these approaches are (1) the choice of the weighting factors is needed, leading to the burden of choosing them in the model, and (2) scalarization only results in a properly efficient solution (Ward & Lee, 2001).…”
Section: Previous Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Typical multi-objective optimization (MOO) studies the problem of optimizing a set of possibly conflicting objectives. One main strategy to solve this problem is the scalarization approach (Ward & Lee, 2001;Nguyen, 2018;Tajmajer, 2018;Vamplew et al, 2017;Van Moffaert et al, 2013), where one or several single-objective optimization problems are solved. Two major disadvantages of these approaches are (1) the choice of the weighting factors is needed, leading to the burden of choosing them in the model, and (2) scalarization only results in a properly efficient solution (Ward & Lee, 2001).…”
Section: Previous Workmentioning
confidence: 99%
“…In the multi-objective setting, the completion of a task requires the simultaneous satisfaction of multiple objectives such as balancing power consumption and performance in Web servers (Tesauro, et al, 2008). Such problems can be modeled as multi-objective Markov decision processes (MOMDPs) and solved by some existing multi-objective reinforcement learning (MORL) algorithms (Tesauro et al, 2008;Nguyen, 2018;Tajmajer, 2018;Abels, et al, 2018;Van Moffaert, Drugan, & Nowé, 2013;Vamplew, Dazeley, & Foale, 2017) based on the assumption that either the weighting factor for different objective functions or the ordering information is available. The solution for parameterized algorithms assumes the weighting factor for different objective functions can be obtained either directly (i.e., known a priori ) or indirectly (through learning).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The author in [15] proposed the use of both linear weighted sum and nonlinear thresholded lexicographic ordering methods to develop a multi-objective deep RL framework that includes both single-and multi-policy strategies. The author in [16] proposed an architecture in which separated deep Q-networks (DQNs) are used to control the agent's behavior with respect to particular objectives. Then, each DQN has an additional decision value output that acts as a dynamic weight used while summing up Q-values.…”
Section: Introductionmentioning
confidence: 99%
“…In summary, most of the algorithms are based on the scalarization method to transform the multi-objective problem into a single objective one. The scalarization can be nonlinear or linear [15], [16], [17], [18]. Other advanced methods include, e.g., the convex hull [19], the varying parameters approaches [20], the constraint method [21], the sequential method [22], and the max-min method [23].…”
Section: Introductionmentioning
confidence: 99%