Knowledge-Based Exploration for Reinforcement Learning in Self-Organizing Neural Networks

Teng, Teck-Hou; Tan, Ah-Hwee

doi:10.1109/wi-iat.2012.154

Cited by 10 publications

(8 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At state s , the estimated Q-value is used as the teaching signal to learn the association of state s and action choice a. Use Exploration Strategy [18] to select an action choice a 6: else if Exploitation then 7:…”

Section: B Incorporating Temporal Difference Methodsmentioning

confidence: 99%

“…Moving constantly, the Blue agent can move into the safe areas to evade the Red agent but does not remain in it. Like [18], the pursuit strategy of the Red agent is deterministic while the Blue agent learns the evasive strategies. The feedback signal to the Blue agent indicates the effectiveness of the evasive maneuvres.…”

Section: B Pursuit-evasion Problem Domainmentioning

confidence: 99%

See 1 more Smart Citation

Fast Reinforcement Learning under Uncertainties with Self-Organizing Neural Networks

Teng

Tan

2015

2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)

Self Cite

View full text Add to dashboard Cite

Using feedback signals from the environment, a reinforcement learning (RL) system typically discovers action policies that recommend actions effective to the states based on a Q-value function. However, uncertainties over the estimation of the Q-values can delay the convergence of RL. For fast RL convergence by accounting for such uncertainties, this paper proposes several enhancements to the estimation and learning of the Q-value using a self-organizing neural network. Specifically, a temporal difference method known as Q-learning is complemented by a Q-value Polarization procedure, which contrasts the Q-values using feedback signals on the effect of the recommended actions. The polarized Q-values are then learned by the self-organizing neural network using a Bi-directional Template Learning procedure. Furthermore, the polarized Qvalues are in turn used to adapt the reward vigilance of the ART-based self-organizing neural network using a Bi-directional Adaptation procedure. The efficacy of the resultant system called Fast Learning (FL) FALCON is illustrated using two singletask problem domains with large MDPs. The experiment results from these problem domains unanimously show FL-FALCON converging faster than the compared approaches.

show abstract

Section: B Incorporating Temporal Difference Methodsmentioning

confidence: 99%

Section: B Pursuit-evasion Problem Domainmentioning

confidence: 99%

Fast Reinforcement Learning under Uncertainties with Self-Organizing Neural Networks

Teng

Tan

2015

2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)

Self Cite

View full text Add to dashboard Cite

show abstract

“…In [2], the knowledge-based exploration strategy is incorporated in TD-FALCON to better use the previously learned knowledge. Specifically, after finding the most similar node to the current state-action pair, the reward can be represented as R = {L J , U J }, where J denotes the most similar node and L J = w c3 J , U J = 1 − w c3 J denote the lower and upper bounds of Q-values associated to node J, respectively.…”

Section: Reinforcement Learning Modelmentioning

confidence: 99%

“…In [2], the knowledge-based exploration categorizes all the feasible actions into three non-overlapping groups: positive, negative and unexplored. Subsequently, the agent randomly selects an action from the reduced action space that comprise of the positive and unexplored actions.…”

Section: Introductionmentioning

confidence: 99%

“…In addition, we also discuss how to leverage the weights associated to these two factors. The above-mentioned directed exploration strategies in [1], [2] are selected as the benchmarking models for the performance comparisons. The undirected exploration strategy with -greedy policy [3] is also selected as a benchmarking model for performance comparisons.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Probabilistic Guided Exploration for Reinforcement Learning in Self-Organizing Neural Networks

Wang

Zhou²,

Wang

et al. 2018

2018 IEEE International Conference on Agents (ICA)

Self Cite

View full text Add to dashboard Cite

Exploration is essential in reinforcement learning, which expands the search space of potential solutions to a given problem for performance evaluations. Specifically, carefully designed exploration strategy may help the agent learn faster by taking the advantage of what it has learned previously. However, many reinforcement learning mechanisms still adopt simple exploration strategies, which select actions in a pure random manner among all the feasible actions. In this paper, we propose novel mechanisms to improve the existing knowledgebased exploration strategy based on a probabilistic guided approach to select actions. We conduct extensive experiments in a Minefield navigation simulator and the results show that our proposed probabilistic guided exploration approach significantly improves the convergence rate.

show abstract

Adaptive computer-generated forces for simulator-based training

Teng¹,

Tan²,

Teow³

2013

Expert Systems with Applications

View full text Add to dashboard Cite

Knowledge-Based Exploration for Reinforcement Learning in Self-Organizing Neural Networks

Cited by 10 publications

References 18 publications

Fast Reinforcement Learning under Uncertainties with Self-Organizing Neural Networks

Fast Reinforcement Learning under Uncertainties with Self-Organizing Neural Networks

Probabilistic Guided Exploration for Reinforcement Learning in Self-Organizing Neural Networks

Adaptive computer-generated forces for simulator-based training

Contact Info

Product

Resources

About