2012
DOI: 10.1016/j.procs.2012.09.110
|View full text |Cite
|
Sign up to set email alerts
|

Self-Regulating Action Exploration in Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…In the future, we could also try out different strategies to implement our agents, such as applying the UCB1 [43] or self-regulated action exploration [44] strategies as the new action selection policy.…”
Section: Discussionmentioning
confidence: 99%
“…In the future, we could also try out different strategies to implement our agents, such as applying the UCB1 [43] or self-regulated action exploration [44] strategies as the new action selection policy.…”
Section: Discussionmentioning
confidence: 99%
“…However, unlike the related works [3]- [6], [9], that use a constant value of  , we decrease it gradually over time from ε start to ε stop . This is the so-called decayed ε-greedy algorithm [15], which avoids performance losses due to random actions once the environment is explored "enough". This leads to higher throughput when the environment is stationary, but if the channel occupancy pattern changes after some time, the algorithm takes a lot of time to discover a new optimal channel allocation scheme.…”
Section: B Exploration Strategymentioning
confidence: 99%
“…Hence, a tradeoff between exploitation (doing optimum action) and exploration (doing other actions to find better policy) is encountered [21]. In many references such as [21]- [24], the issue of exploration is solved by various ways, which is aimed to improve the performance and convergence. In this paper, the simplest way, ε-soft on-policy method is used which updates action on the basis of the experience gained from executing policy [19].…”
Section: Reinforcement Learningmentioning
confidence: 99%