2016
DOI: 10.1145/3008665.3008670
|View full text |Cite
|
Sign up to set email alerts
|

Multi-objective decision-theoretic planning

Abstract: Decision making is hard. It o en requires reasoning about uncertain environments, partial observability and action spaces that are too large to enumerate. In such complex decisionmaking tasks decision-theoretic agents, that can reason about their environments on the basis of mathematical models and produce policies that optimize the utility for their users, can o en assist us.In most research on decision-theoretic agents, the desirability of actions and their e ects is codi ed in a scalar reward function. Howe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
2
1

Relationship

5
1

Authors

Journals

citations
Cited by 13 publications
(24 citation statements)
references
References 61 publications
0
24
0
Order By: Relevance
“…With this approach, we plan to learn a coverage set containing an optimal strategy for every possible preference profile the decision makers might have [37]. We aim to design suitable quality metrics [36,40,43] tailored to the use case of epidemiological preventive strategy learning, to support the entire spectrum of epidemiological models and thus to prevent method overfitting [43].…”
Section: Discussionmentioning
confidence: 99%
“…With this approach, we plan to learn a coverage set containing an optimal strategy for every possible preference profile the decision makers might have [37]. We aim to design suitable quality metrics [36,40,43] tailored to the use case of epidemiological preventive strategy learning, to support the entire spectrum of epidemiological models and thus to prevent method overfitting [43].…”
Section: Discussionmentioning
confidence: 99%
“…The OLS algorithm starts by taking an empty set of X denoting the CCS of value vectors as shown in line 1 of Algorithm 1. The algorithm repeatedly executes steps 2 to 9 until no improved value vectors are found evaluated by the Maximal Possible Improvement Δ [42,44]. In the first two iterations of the while loop, the algorithm selects the first two corner points as the extrema of the weights simplex, i.e.…”
Section: Optimistic Linear Supportmentioning
confidence: 99%
“…However, this set is typically too large, and may be prohibitively expensive to retrieve. Furthermore, as Vamplew et al [2009] have shown, if stochastic polices are allowed, a much smaller solution set suffices to construct a Pareto-front, i.e., we can use stochastic mixtures between the policies in the deterministic stationary convex coverage set (CCS), which is much easier to compute, and allows for algorithms that exploit the properties of the CCS to retrieve the optimal policies, such as outer loop methods [Roijers, 2016] as discussed further in Section 7.2.3. Moreover, in practical applications, a lot more might be known about the utility function of the user, due to domain knowledge.…”
Section: The Utility-based Approachmentioning
confidence: 99%
“…So if you are wondering whether a certain property holds, it is prudent to consult the POMDP literature as well. Secondly, it means that methods that have been invented originally for POMDPs, can often be adapted for usage in MOMDPs [Roijers, 2016]. While doing so, it is key to note that the number of objectives in a MOMDP correspond to the number of states in a POMDP (i.e., the dimensionality of the belief-and α-vectors) [Roijers et al, 2015a].…”
Section: Partially Observable Mdpsmentioning
confidence: 99%