2023
DOI: 10.3390/s23084166
|View full text |Cite
|
Sign up to set email alerts
|

Distributed DRL-Based Computation Offloading Scheme for Improving QoE in Edge Computing Environments

Abstract: Various edge collaboration schemes that rely on reinforcement learning (RL) have been proposed to improve the quality of experience (QoE). Deep RL (DRL) maximizes cumulative rewards through large-scale exploration and exploitation. However, the existing DRL schemes do not consider the temporal states using a fully connected layer. Moreover, they learn the offloading policy regardless of the importance of experience. They also do not learn enough because of their limited experiences in distributed environments.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 41 publications
0
0
0
Order By: Relevance
“…Initialize replay memory D with capacity N Initialize Q-network with random weights θ Initialize target Q-network with weights θ_target = θ Initialize offloading environment for episode = 1, M do Initialize state s for t = 1, T_max do Choose action a from state s using ε-greedy policy 'Execute action a and observe reward r and next state s' 'Store transition (s, a, r, s') in replay memory D' 'Sample random mini-batch of transitions (sj, aj, rj, s'j) from D' Compute target Q-values: if s' is the terminal state then target = rj else target = rj + γ * max(Q_target(s'_j, a', θ_target)) Update Q-network parameters θ by minimizing the loss: loss = 1/N * Σ(target -Q(sj, aj, θ)) 2 θ = θ -α * ∇_θ(loss) For every C steps, update target Q-network: θ_target = θ if s' is the terminal state then Break else s = s' Every episode, evaluate performance and monitor convergence end for M denotes the total number of episodes. T_max represents the maximum number of steps per episode.…”
Section: Algorithm 1: Dqn Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…Initialize replay memory D with capacity N Initialize Q-network with random weights θ Initialize target Q-network with weights θ_target = θ Initialize offloading environment for episode = 1, M do Initialize state s for t = 1, T_max do Choose action a from state s using ε-greedy policy 'Execute action a and observe reward r and next state s' 'Store transition (s, a, r, s') in replay memory D' 'Sample random mini-batch of transitions (sj, aj, rj, s'j) from D' Compute target Q-values: if s' is the terminal state then target = rj else target = rj + γ * max(Q_target(s'_j, a', θ_target)) Update Q-network parameters θ by minimizing the loss: loss = 1/N * Σ(target -Q(sj, aj, θ)) 2 θ = θ -α * ∇_θ(loss) For every C steps, update target Q-network: θ_target = θ if s' is the terminal state then Break else s = s' Every episode, evaluate performance and monitor convergence end for M denotes the total number of episodes. T_max represents the maximum number of steps per episode.…”
Section: Algorithm 1: Dqn Algorithmmentioning
confidence: 99%
“…To address these obstacles, mobile edge computing (MEC) has emerged as a promising paradigm to enhance the capabilities of wireless networks by bringing computation closer to the network edge. By deploying computational resources, storage, and networking infrastructure at the edge of the network, MEC aims to alleviate the burden on centralized cloud servers and reduce latency for time-sensitive applications [2], [3].…”
Section: Introductionmentioning
confidence: 99%