Modular Multi-Objective Deep Reinforcement Learning with Decision Values

Tajmajer, Tomasz

doi:10.15439/2018f231

Cited by 31 publications

(22 citation statements)

References 8 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Typical multi-objective optimization (MOO) studies the problem of optimizing a set of possibly conflicting objectives. One main strategy to solve this problem is the scalarization approach (Ward & Lee, 2001;Nguyen, 2018;Tajmajer, 2018;Vamplew et al, 2017;Van Moffaert et al, 2013), where one or several single-objective optimization problems are solved. Two major disadvantages of these approaches are (1) the choice of the weighting factors is needed, leading to the burden of choosing them in the model, and (2) scalarization only results in a properly efficient solution (Ward & Lee, 2001).…”

Section: Previous Workmentioning

confidence: 99%

“…In the multi-objective setting, the completion of a task requires the simultaneous satisfaction of multiple objectives such as balancing power consumption and performance in Web servers (Tesauro, et al, 2008). Such problems can be modeled as multi-objective Markov decision processes (MOMDPs) and solved by some existing multi-objective reinforcement learning (MORL) algorithms (Tesauro et al, 2008;Nguyen, 2018;Tajmajer, 2018;Abels, et al, 2018;Van Moffaert, Drugan, & Nowé, 2013;Vamplew, Dazeley, & Foale, 2017) based on the assumption that either the weighting factor for different objective functions or the ordering information is available. The solution for parameterized algorithms assumes the weighting factor for different objective functions can be obtained either directly (i.e., known a priori ) or indirectly (through learning).…”

Section: Introductionmentioning

confidence: 99%

“…For example, an approach was proposed to use both linear weighted sum and nonlinear thresholded lexicographic ordering methods to develop a multi-objective deep RL framework that includes both single-and multi-policy strategies (Nguyen, 2018). An architecture that uses separated deep Q-networks (DQNs) was proposed to control the agent's behavior with respect to particular objectives (Tajmajer, 2018). Each DQN has an additional decision value output that acts as a dynamic weight that is used while summing up Q-values.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets

Cao

Zhan²

2021

jair

View full text Add to dashboard Cite

Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple (yet often conflicting) objectives. A typical approach to obtain the optimal policies is to first construct a loss function based on the scalarization of individual objectives and then derive optimal policies that minimize the scalarized loss function. Albeit simple and efficient, the typical approach provides no insights/mechanisms on the optimization of multiple objectives due to the lack of ability to quantify the inter-objective relationship. To address the issue, we propose to develop a new efficient gradient-based multi-objective reinforcement learning approach that seeks to iteratively uncover the quantitative inter-objective relationship via finding a minimum-norm point in the convex hull of the set of multiple policy gradients when the impact of one objective on others is unknown a priori. In particular, we first propose a new PAOLS algorithm that integrates pruning and approximate optimistic linear support algorithm to efficiently discover the weight-vector sets of multiple gradients that quantify the inter-objective relationship. Then we construct an actor and a multi-objective critic that can co-learn the policy and the multi-objective vector value function. Finally, the weight discovery process and the policy and vector value function learning process can be iteratively executed to yield stable weight-vector sets and policies. To validate the effectiveness of the proposed approach, we present a quantitative evaluation of the approach based on three case studies.

show abstract

Section: Previous Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets

Cao

Zhan²

2021

jair

View full text Add to dashboard Cite

show abstract

“…The author in [15] proposed the use of both linear weighted sum and nonlinear thresholded lexicographic ordering methods to develop a multi-objective deep RL framework that includes both single-and multi-policy strategies. The author in [16] proposed an architecture in which separated deep Q-networks (DQNs) are used to control the agent's behavior with respect to particular objectives. Then, each DQN has an additional decision value output that acts as a dynamic weight used while summing up Q-values.…”

Section: Introductionmentioning

confidence: 99%

“…In summary, most of the algorithms are based on the scalarization method to transform the multi-objective problem into a single objective one. The scalarization can be nonlinear or linear [15], [16], [17], [18]. Other advanced methods include, e.g., the convex hull [19], the varying parameters approaches [20], the constraint method [21], the sequential method [22], and the max-min method [23].…”

Section: Introductionmentioning

confidence: 99%

Relationship Explainable Multi-objective Reinforcement Learning with Semantic Explainability Generation

Zhan,

Cao

2019

Preprint

View full text Add to dashboard Cite

Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple, yet often conflicting objectives. A typical approach to obtain optimal policies is to first construct a loss function that is based on the scalarization of individual objectives, and then find the optimal policy that minimizes the loss. However, optimizing the scalarized (and weighted) loss does not necessarily provide guarantee of high performance on each possibly conflicting objective because it is challenging to assign the right weights without knowing the relationship among these objectives. Moreover, the effectiveness of these gradient descent algorithms is limited by the agent's ability to explain their decisions and actions to human users. The purpose of this study is two-fold. First, we propose a vector value function based multi-objective reinforcement learning (V2f-MORL) approach that seeks to quantify the interobjective relationship via reinforcement learning (RL) when the impact of one objective on others is unknown a prior. In particular, we construct one actor and multiple critics that can co-learn the policy and inter-objective relationship matrix (IORM), quantifying the impact of objectives on each other, in an iterative way. Second, we provide a semantic representation that can uncover the trade-off of decision policies made by users to reconcile conflicting objectives based on the proposed V2f-MORL approach for the explainability of the generated behaviors subject to given optimization objectives. We demonstrate the effectiveness of the proposed approach via a MuJoCo based robotics case study.

show abstract

Solving Complex Sequential Decision-Making Problems by Deep Reinforcement Learning with Heuristic Rules

Nguyen

Huynh‐The

et al. 2023

Computational Science – ICCS 2023

View full text Add to dashboard Cite

Modular Multi-Objective Deep Reinforcement Learning with Decision Values

Cited by 31 publications

References 8 publications

Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets

Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets

Relationship Explainable Multi-objective Reinforcement Learning with Semantic Explainability Generation

Solving Complex Sequential Decision-Making Problems by Deep Reinforcement Learning with Heuristic Rules

Contact Info

Product

Resources

About