2009
DOI: 10.1016/j.fss.2008.11.026
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement distribution in fuzzy Q-learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(25 citation statements)
references
References 15 publications
0
25
0
Order By: Relevance
“…1. Q-learning has been used for learning fuzzy systems [21]. Since the state and actions of Q-learning algorithm can be set by fuzzy variables, Q-learning can take advantage of fuzziness.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…1. Q-learning has been used for learning fuzzy systems [21]. Since the state and actions of Q-learning algorithm can be set by fuzzy variables, Q-learning can take advantage of fuzziness.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…The partial overlapping of nearby linguistic variables of a fuzzified input gives fuzzy Q-learning an improved flexibility in addition to robustness and smoothness [19]. In the context of this paper, small cells (RL agents) tend to learn optimal CREO values (actions) for each of the fuzzy rules through iterative interaction with their environment, which includes dynamic radio conditions, temporal and spatial fluctuations in users' traffic, and backhaul capacity variations.…”
Section: Proposed Fuzzy Q-learning-based User-centricmentioning
confidence: 99%
“…By using the standard Q-learning method for the decision making of an agent, a table must be set up to store each pair of state and action. The Q-value can be calculated by [20], [29] (1)…”
Section: A the Maximum Mapping Value Function Of The Proposed Approachmentioning
confidence: 99%
“…For example, the reward function based on the weighted sum is complicated, which need to calculate the weights of the factors [9]. Some reward values obtained by the methods based on fuzzy rules are discrete and the number of them is limited, which are not suitable to the complex system, because the reward values are continuous and unlimited in realtime applications [20]. To deal with the problems above, an adaptive reward value function is proposed, which is introduced as follows: (7) where is the reward value for the th agent; is the number of influence factors (the influence factors in this paper are the indexes for water resource optimal allocation, such as water quality and economic benefit index); is the th influence factor; is an action (decision), and is a benefit function used to calculate the benefit of the influence factor under the certain action , which is determined by the actual application.…”
Section: The Adaptive Multifactor Reward Value Functionmentioning
confidence: 99%
See 1 more Smart Citation