Self-Regulating Action Exploration in Reinforcement Learning

Teng, Teck-Hou; Tan, Ah-Hwee; Tan, Yuan-Sin

doi:10.1016/j.procs.2012.09.110

Cited by 10 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the future, we could also try out different strategies to implement our agents, such as applying the UCB1 [43] or self-regulated action exploration [44] strategies as the new action selection policy.…”

Section: Discussionmentioning

confidence: 99%

Creating Autonomous Adaptive Agents in a Real-Time First-Person Shooter Computer Game

Wang

Tan

2015

IEEE Trans. Comput. Intell. AI Games

Self Cite

View full text Add to dashboard Cite

Games are good test-beds to evaluate AI methodologies. In recent years, there has been a vast amount of research dealing with real-time computer games other than the traditional board games or card games. This paper illustrates how we create agents by employing FALCON, a self-organizing neural network that performs reinforcement learning, to play a well-known first-person shooter computer game called Unreal Tournament. Rewards used for learning are either obtained from the game environment or estimated using the temporal difference learning scheme. In this way, the agents are able to acquire proper strategies and discover the effectiveness of different weapons without any guidance or intervention. The experimental results show that our agents learn effectively and appropriately from scratch while playing the game in real-time. Moreover, with the previously learned knowledge retained, our agent is able to adapt to a different opponent in a different map within a relatively short period of time.

show abstract

Section: Discussionmentioning

confidence: 99%

Creating Autonomous Adaptive Agents in a Real-Time First-Person Shooter Computer Game

Wang

Tan

2015

IEEE Trans. Comput. Intell. AI Games

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, unlike the related works [3]- [6], [9], that use a constant value of  , we decrease it gradually over time from ε start to ε stop . This is the so-called decayed ε-greedy algorithm [15], which avoids performance losses due to random actions once the environment is explored "enough". This leads to higher throughput when the environment is stationary, but if the channel occupancy pattern changes after some time, the algorithm takes a lot of time to discover a new optimal channel allocation scheme.…”

Section: B Exploration Strategymentioning

confidence: 99%

Dynamic spectrum access with deep Q-learning in densely occupied and partially observable environments

Tomović

Radusinović

2021

Telfor J

View full text Add to dashboard Cite

In this paper, we propose a new Dynamic Spectrum Access (DSA) method for multi-channel wireless networks. We assume that DSA nodes do not have prior knowledge of the system dynamics, and have only partial observability of the channels. Thus, the problem is formulated as a Partially Observable Markov Decision Process (POMDP) with exponential time complexity. We have developed a novel Deep Reinforcement Learning (DRL) based DSA method which combines a double deep Q-learning architecture with a recurrent neural network and takes advantage of a prioritized experience buffer. The simulation analysis shows that the proposed method accurately predicts a channel state based on the fixed-length history of partial observations. Compared with other DRL methods for DSA, the proposed solution can find a near-optimal policy in a smaller number of iterations and suits a wider range of communication environments, including dynamic ones, where channel occupancy pattern changes over time. The performance improvement increases with the number of channels and with a channel state transition uncertainty. To boost the performance of the algorithm in densely occupied environments, multiple DRL exploration strategies are examined and evaluation results are presented in the paper.

show abstract

“…Hence, a tradeoff between exploitation (doing optimum action) and exploration (doing other actions to find better policy) is encountered [21]. In many references such as [21]- [24], the issue of exploration is solved by various ways, which is aimed to improve the performance and convergence. In this paper, the simplest way, ε-soft on-policy method is used which updates action on the basis of the experience gained from executing policy [19].…”

Section: Reinforcement Learningmentioning

confidence: 99%

Energy Hub optimal sizing in the smart grid; machine learning approach

Sheikhi

Rayati

Ranjbar

2015

2015 IEEE Power &Amp; Energy Society Innovative Smart Grid Technologies Conference (ISGT)

View full text Add to dashboard Cite

The interests in "Energy Hub" (EH) and "Smart Grid" (SG) concepts have been increasing, in recent years. The synergy effect of the coupling between electricity and natural gas grids and utilizing intelligent technologies for communicating, may change energy management in the future. A new solution entitling "Smart Energy Hub" (S. E. Hub) that models a multi-carrier energy system in a SG environment studied in this paper. Moreover, the optimal size of CHP, auxiliary boiler, absorption chiller, and also transformer unit as main elements of a S. E. Hub is determined. Authors proposed a comprehensive cost and benefit analysis to optimize these elements and apply Reinforcement Learning (RL) algorithm for solving the optimization problem. To confirm the proposed method, a residential customer has been investigated as an S. E. Hub in a dynamic electricity pricing market. Index Terms--Smart Grids, Smart Energy Hub (S. E. Hub), Energy Management System, Reinforcement Learning (RL), Optimal size, financial analysisI.

show abstract

Self-Regulating Action Exploration in Reinforcement Learning

Cited by 10 publications

References 15 publications

Creating Autonomous Adaptive Agents in a Real-Time First-Person Shooter Computer Game

Creating Autonomous Adaptive Agents in a Real-Time First-Person Shooter Computer Game

Dynamic spectrum access with deep Q-learning in densely occupied and partially observable environments

Energy Hub optimal sizing in the smart grid; machine learning approach

Contact Info

Product

Resources

About