Risk Sensitive Stochastic Shortest Path and LogSumExp: From Theory to Practice

Freitas, Elthon Manhas de; Freire, Valdinei; Delgado, Karina Valdivia

doi:10.1007/978-3-030-61380-8_9

Cited by 2 publications

(7 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We now define a family of utility functions that can express a static risk attitude of risk-aware HTN planning agents. This family includes linear and exponential functions that are commonly used to express risk-sensitivity [42,43,44]. We denote this family by U c .…”

Section: Static Risk Attitudesmentioning

confidence: 99%

Risk Awareness in HTN Planning

Alnazer¹,

Georgievski²,

Aiello³

2022

Preprint

View full text Add to dashboard Cite

Actual real-world domains are characterised by uncertain situations in which acting and use of resources require embracing risk. Performing actions in such domains always entails costs of consuming some resource, such as time, money, or energy, where the knowledge about these costs can range from totally known to totally unknown and even unknowable probabilities of costs. Think of robotic and marine domains, where actions and their costs are nondeterministic due to the uncertainty of factors, such as obstacles and weather conditions. Choosing which action to perform considering its cost on the available resource requires taking a stance on risk. Thus, these domains call for not only planning under uncertainty but also planning while embracing risk. Taking Hierarchical Task Network (HTN) planning as a widely used planning technique in real-world applications, one can observe that existing approaches do not account for risk. That is, computing most probable or optimal plans using actions with single-valued costs is only enough to express risk neutrality. In this work, we postulate that HTN planning can become risk aware by considering expected utility theory, a representative concept of decision theory that enables choosing actions considering a probability distribution of their costs and a given risk attitude expressed using a utility function. In particular, we introduce a general framework for HTN planning that allows modelling risk and uncertainty using a probability distribution of action costs upon which we define risk-aware HTN planning as an approach that accounts for the different risk attitudes and allows computing plans that * Corresponding author.

show abstract

Section: Static Risk Attitudesmentioning

confidence: 99%

Risk Awareness in HTN Planning

Alnazer¹,

Georgievski²,

Aiello³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Intuitively, the certainty equivalent is the reward that an agent would prefer to receive with certainty rather than taking a chance on an uncertain outcome. If 𝑈 (𝑉 𝜋 (𝑠)) < ∞ and there exists the inverse function 𝑈 −1 ∶ ℝ → ℝ + , the certainty equivalent 𝑉 𝜋 (𝑠) of a policy 𝜋 is defined by (Bäuerle and Rieder, 2014;Freitas et al, 2020):…”

Section: Risk and Certainty Equivalentmentioning

confidence: 99%

“…The integration of the EEU criterion with the LogSumExp has demonstrated promising results (Freitas et al, 2020), and its application on EEU is simple. Next, we present how LogSumExp can be employed in Value Iteration with EEU (Freitas et al, 2020).…”

Section: Logsumexpmentioning

confidence: 99%

“…• Scalability and Computational Complexity: When exponential risk criteria are introduced to Reinforcement Learning, the computational complexity further increases and as the values grow exponentially it becomes a hard task to prevent numeric overflow (Bäuerle and Rieder, 2014;Shen et al, 2014;Freitas et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

“…Numerical overflow has been a focus of some studies, with efforts to address the issue through mathematical techniques such as LogSumExp (Naylor et al, 2001;Freitas et al, 2020). However, the application of this technique to Reinforcement Learning algorithms has not been extensively explored.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis

Pereira Neto

View full text Add to dashboard Cite

O Aprendizado por Reforço provou ser altamente bem-sucedido na resolução de problemas de decisão sequencial em ambientes complexos, com foco na maximização da recompensa acumulada esperada. Embora Aprendizado por Reforço tenha mostrado seu valor, os cenários do mundo real geralmente envolvem riscos inerentes que vão além dos resultados esperados, onde, na mesma situação, diferentes agentes podem considerar assumir diferentes níveis de risco. Nesses casos, o Aprendizado por Reforço Sensível ao Risco surge como uma solução, incorporando critérios de risco ao processo de tomada de decisão. Dentre esses critérios, métodos baseados em exponencial têm sido extensivamente estudados e aplicados. No entanto, a resposta de critérios exponenciais quando integrados com parâmetros de aprendizagem e aproximações, particularmente em combinação com Aprendizado por Reforço Profundo, permanece relativamente inexplorado. Essa falta de conhecimento pode impactar diretamente na aplicabilidade desses métodos em cenários do mundo real. Nesta dissertação, apresentamos um arcabouço que facilita a comparação de critérios de risco exponencial, como Utilidade Exponencial Esperada, Transformação Exponencial da Diferença Temporal e Transformação da Diferença Temporal com Soft Indicator considerando algoritmos de Aprendizagem por Reforço, como Q-Learning e Deep Q-Learning. Demonstramos formalmente que a Utilidade Esperada Exponencial e a Transformação Exponencial da Diferença Temporal convergem para o mesmo valor. Também realizamos experimentos para explorar a relação de cada critério de risco exponencial com o parâmetro de taxa de aprendizado, o fator de risco e os algoritmos de amostragem. Os resultados revelam que a Utilidade Esperada Exponencial apresenta estabilidade superior. Adicionalmente, esta dissertação analisa empiricamente problemas de estouro numérico. Uma técnica de truncamento para lidar com esse problema é analisada. Além disso, propomos a aplicação da técnica LogSumExp para mitigar este problema em algoritmos que utilizam a Utilidade Esperada Exponencial.

show abstract

Risk Sensitive Stochastic Shortest Path and LogSumExp: From Theory to Practice

Cited by 2 publications

References 25 publications

Risk Awareness in HTN Planning

Risk Awareness in HTN Planning

Risk Sensitivity with exponential functions in reinforcement learning: an empirical analysis

Contact Info

Product

Resources

About