2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology 2012
DOI: 10.1109/wi-iat.2012.154
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge-Based Exploration for Reinforcement Learning in Self-Organizing Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…At state s , the estimated Q-value is used as the teaching signal to learn the association of state s and action choice a. Use Exploration Strategy [18] to select an action choice a 6: else if Exploitation then 7:…”
Section: B Incorporating Temporal Difference Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…At state s , the estimated Q-value is used as the teaching signal to learn the association of state s and action choice a. Use Exploration Strategy [18] to select an action choice a 6: else if Exploitation then 7:…”
Section: B Incorporating Temporal Difference Methodsmentioning
confidence: 99%
“…Moving constantly, the Blue agent can move into the safe areas to evade the Red agent but does not remain in it. Like [18], the pursuit strategy of the Red agent is deterministic while the Blue agent learns the evasive strategies. The feedback signal to the Blue agent indicates the effectiveness of the evasive maneuvres.…”
Section: B Pursuit-evasion Problem Domainmentioning
confidence: 99%
“…In [2], the knowledge-based exploration strategy is incorporated in TD-FALCON to better use the previously learned knowledge. Specifically, after finding the most similar node to the current state-action pair, the reward can be represented as R = {L J , U J }, where J denotes the most similar node and L J = w c3 J , U J = 1 − w c3 J denote the lower and upper bounds of Q-values associated to node J, respectively.…”
Section: Reinforcement Learning Modelmentioning
confidence: 99%
“…In [2], the knowledge-based exploration categorizes all the feasible actions into three non-overlapping groups: positive, negative and unexplored. Subsequently, the agent randomly selects an action from the reduced action space that comprise of the positive and unexplored actions.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation