2018
DOI: 10.1109/access.2018.2854283
|View full text |Cite
|
Sign up to set email alerts
|

A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(23 citation statements)
references
References 27 publications
0
20
0
Order By: Relevance
“…The Q-learning Markov decision process (MDP) algorithm was used under the constraint to achieve the minimum computation latency, communication latency, and network latency by allocating data packets to different processors of virtual machines. Q-learning MDP is a mathematical framework for modeling decision-making and observations by collecting feedback from past experience in a dynamic environment [43]. The proposed approach requires a Q-learning MDP to account for the dynamic behavior of the IoT-fog-cloud system [23, 43].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The Q-learning Markov decision process (MDP) algorithm was used under the constraint to achieve the minimum computation latency, communication latency, and network latency by allocating data packets to different processors of virtual machines. Q-learning MDP is a mathematical framework for modeling decision-making and observations by collecting feedback from past experience in a dynamic environment [43]. The proposed approach requires a Q-learning MDP to account for the dynamic behavior of the IoT-fog-cloud system [23, 43].…”
Section: Methodsmentioning
confidence: 99%
“…Q-learning MDP is a mathematical framework for modeling decision-making and observations by collecting feedback from past experience in a dynamic environment [43]. The proposed approach requires a Q-learning MDP to account for the dynamic behavior of the IoT-fog-cloud system [23, 43]. The IoT-fog-cloud system was unable to predict the transition probabilities and rewards because of dynamically changing incoming data packet requests at fog nodes.…”
Section: Methodsmentioning
confidence: 99%
“…Optimizing this function in the process of mapping an unknown environment, where the objective model and the time needed to build it are unknown, is still under research. Though the process of predicting the future impact of an action is computationally expensive, there are recent advancements by using spectral techniques [ 163 ] and deep learning [ 164 ].…”
Section: On Going Developmentsmentioning
confidence: 99%
“…The conceptual backbone of the model is that the representation of environmental dynamics is organized as a hierarchy of time scales (Kiebel, Daunizeau, and Friston 2008). Such modelling approaches have been proposed in cognitive control in the context of hierarchical reinforcement learning (HRL), (e.g., Botvinick and Weinstein 2014;Holroyd and McClure 2015) and are naturally also an increasingly relevant topic in artificial intelligence research (e.g., Bacon and Precup 2018;Pang et al 2019;Le, Vien, and Chung 2018;Mnih et al 2015). In general, HRL models are based on the idea that action sequences can be chunked and represented as a new temporally extended state, (see also Maisto, Donnarumma, and Pezzulo 2015) for a probabilistic modelling alternative.…”
Section: Uncertainty and A Hierarchy Of Time Scalesmentioning
confidence: 99%