Multi-objective decision-theoretic planning

Roijers, Diederik M.

doi:10.1145/3008665.3008670

Cited by 13 publications

(24 citation statements)

References 61 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With this approach, we plan to learn a coverage set containing an optimal strategy for every possible preference profile the decision makers might have [37]. We aim to design suitable quality metrics [36,40,43] tailored to the use case of epidemiological preventive strategy learning, to support the entire spectrum of epidemiological models and thus to prevent method overfitting [43].…”

Section: Discussionmentioning

confidence: 99%

Efficient Evaluation of Influenza Mitigation Strategies Using Preventive Bandits

Libin

Verstraeten

Theys

et al. 2017

Autonomous Agents and Multiagent Systems

Self Cite

View full text Add to dashboard Cite

Pandemic influenza has the epidemiological potential to kill millions of people. While different preventive measures exist, it remains challenging to implement them in an effective and efficient way. To improve preventive strategies, it is necessary to thoroughly understand their impact on the complex dynamics of influenza epidemics. To this end, epidemiological models provide an essential tool to evaluate such strategies in silico. Epidemiological models are frequently used to assist the decision making concerning the mitigation of ongoing epidemics. Therefore, rapidly identifying the most promising preventive strategies is crucial to adequately inform public health officials. To this end, we formulate the evaluation of prevention strategies as a multiarmed bandit problem. The utility of this novel evaluation method is validated through experiments in the context of an individual-based influenza model. We demonstrate that it is possible to identify the optimal strategy using only a limited number of model evaluations, even if there is a large number of preventive strategies to consider.

show abstract

Section: Discussionmentioning

confidence: 99%

Efficient Evaluation of Influenza Mitigation Strategies Using Preventive Bandits

Libin

Verstraeten

Theys

et al. 2017

Autonomous Agents and Multiagent Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…The OLS algorithm starts by taking an empty set of X denoting the CCS of value vectors as shown in line 1 of Algorithm 1. The algorithm repeatedly executes steps 2 to 9 until no improved value vectors are found evaluated by the Maximal Possible Improvement Δ [42,44]. In the first two iterations of the while loop, the algorithm selects the first two corner points as the extrema of the weights simplex, i.e.…”

Section: Optimistic Linear Supportmentioning

confidence: 99%

Decision-making under uncertainty: be aware of your priorities

2022

View full text Add to dashboard Cite

Self-adaptive systems (SASs) are increasingly leveraging autonomy in their decision-making to manage uncertainty in their operating environments. A key problem with SASs is ensuring their requirements remain satisfied as they adapt. The trade-off analysis of the non-functional requirements (NFRs) is key to establish balance among them. Further, when performing the trade-offs it is necessary to know the importance of each NFR to be able to resolve conflicts among them. Such trade-off analyses are often built upon optimisation methods, including decision analysis and utility theory. A problem with these techniques is that they use a single-scalar utility value to represent the overall combined priority for all the NFRs. However, this combined scalar priority value may hide information about the impacts of the environmental contexts on the individual NFRs’ priorities, which may change over time. Hence, there is a need for support for runtime, autonomous reasoning about the separate priority values for each NFR, while using the knowledge acquired based on evidence collected. In this paper, we propose Pri-AwaRE, a self-adaptive architecture that makes use of Multi-Reward Partially Observable Markov Decision Process (MR-POMDP) to perform decision-making for SASs while offering awareness of NFRs’ priorities. MR-POMDP is used as a priority-aware runtime specification model to support runtime reasoning and autonomous tuning of the distinct priority values of NFRs using a vector-valued reward function. We also evaluate the usefulness of our Pri-AwaRE approach by applying it to two substantial example applications from the networking and IoT domains.

show abstract

“…However, this set is typically too large, and may be prohibitively expensive to retrieve. Furthermore, as Vamplew et al [2009] have shown, if stochastic polices are allowed, a much smaller solution set suffices to construct a Pareto-front, i.e., we can use stochastic mixtures between the policies in the deterministic stationary convex coverage set (CCS), which is much easier to compute, and allows for algorithms that exploit the properties of the CCS to retrieve the optimal policies, such as outer loop methods [Roijers, 2016] as discussed further in Section 7.2.3. Moreover, in practical applications, a lot more might be known about the utility function of the user, due to domain knowledge.…”

Section: The Utility-based Approachmentioning

confidence: 99%

“…So if you are wondering whether a certain property holds, it is prudent to consult the POMDP literature as well. Secondly, it means that methods that have been invented originally for POMDPs, can often be adapted for usage in MOMDPs [Roijers, 2016]. While doing so, it is key to note that the number of objectives in a MOMDP correspond to the number of states in a POMDP (i.e., the dimensionality of the belief-and α-vectors) [Roijers et al, 2015a].…”

Section: Partially Observable Mdpsmentioning

confidence: 99%

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Hayes,

Rădulescu,

Bargiacchi

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multiobjective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.

show abstract

Multi-objective decision-theoretic planning

Cited by 13 publications

References 61 publications

Efficient Evaluation of Influenza Mitigation Strategies Using Preventive Bandits

Efficient Evaluation of Influenza Mitigation Strategies Using Preventive Bandits

Decision-making under uncertainty: be aware of your priorities

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Contact Info

Product

Resources

About