2020
DOI: 10.1109/tetc.2018.2805718
|View full text |Cite
|
Sign up to set email alerts
|

Green Resource Allocation Based on Deep Reinforcement Learning in Content-Centric IoT

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
85
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 182 publications
(85 citation statements)
references
References 56 publications
0
85
0
Order By: Relevance
“…Secondly, the dueling DQN approach is also integrated in the design with the intuition that it is not always necessary to estimate the reward by taking some action. The state-action Qvalue in dueling DQN is decomposed into one value function representing the reward in the current state, and the advantage optimize cache hit rate [70], cache expiration time [74], interference alignment [76]- [78], Quality of Experience [79], [81], energy efficiency [84], resource allocation [85]- [87], traffic latency, or redundancy [89], [91]. function that measures the relative importance of a certain action compared with other actions.…”
Section: A Wireless Proactive Cachingmentioning
confidence: 99%
See 1 more Smart Citation
“…Secondly, the dueling DQN approach is also integrated in the design with the intuition that it is not always necessary to estimate the reward by taking some action. The state-action Qvalue in dueling DQN is decomposed into one value function representing the reward in the current state, and the advantage optimize cache hit rate [70], cache expiration time [74], interference alignment [76]- [78], Quality of Experience [79], [81], energy efficiency [84], resource allocation [85]- [87], traffic latency, or redundancy [89], [91]. function that measures the relative importance of a certain action compared with other actions.…”
Section: A Wireless Proactive Cachingmentioning
confidence: 99%
“…The TE-aware exploration leverages the shortest path algorithm and NUM-based solution as the baseline during exploration. The PER method is conventionally used in DQL, e.g., [79] and [109], while the authors in [136] integrate the PER method with the actor-critic framework for the first time. The proposed scheme assigns different priorities to transitions in the experience replay.…”
Section: A Traffic Engineering and Routingmentioning
confidence: 99%
“…for each decision epoch k = 0, 1, 2, · · · do k ← k + 1 for each action a k ∈ A s k do Determine the post-decision global system states k ← f (s k , a k ) by (5) Determine the post-decision local system state vector {s n,k } N n=1 ← {f n (s n,k , a k )} N n=1 by (15) for each IoT device n ∈ N do Encodes k in the input vector xs n,k Determine the feature φs n,k (s k ) by (20), (21), (22) Determine the local reward functiong n (s k ) by (17) end for Determine the value of the RHS of (25) end for Select the action according to (25) end for…”
Section: Algorithm 1 Determine the Optimal Actionmentioning
confidence: 99%
“…The detailed procedures and equations to update the value functions and weights are given in Appendix B and summarized in Algorithm 2 below. Note that at the kth decision epoch, θ n,k is used in place of θ n in (25) to derive the optimal action π * (s k ). In the following discussion, we add a subscript k to the notations described in Section III.B to represent the parameter values at the k-th decision epoch.…”
Section: Per-node Value Function and Weight Updatementioning
confidence: 99%
See 1 more Smart Citation