2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7952670
|View full text |Cite
|
Sign up to set email alerts
|

Balancing exploration and exploitation in reinforcement learning using a value of information criterion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
16
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 27 publications
(16 citation statements)
references
References 15 publications
0
16
0
Order By: Relevance
“…In our previous work, we showed that these desires can be realized by leveraging information that the states carry about the actions [35]. This idea of utilizing information in decision-making was developed into a rigorous theory by Stratonovich [4,5].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In our previous work, we showed that these desires can be realized by leveraging information that the states carry about the actions [35]. This idea of utilizing information in decision-making was developed into a rigorous theory by Stratonovich [4,5].…”
Section: Methodsmentioning
confidence: 99%
“…The optimization of (3.1), under the constraint (3.2), can be performed by first converting the criterion into an unconstrained problem using the theory of Lagrange multipliers. Differentiating the unconstrained criterion and solving for the policy probabilities leads to an alternating, expectation-maximization-type update [35]. These alternating updates yield a soft-max-based action-selection process [1] for exploring the policy search space.…”
Section: Value Of Information: Random Exploration Casementioning
confidence: 99%
“…We have previously shown that the value of information can be applied to multi-state, multi-action decision-making problems that can be solved using reinforcement learning [ 17 , 19 , 37 ]. Here, we simplify this criterion from the multi-state case to that of the single-state so that it is suitable for addressing the multi-armed bandit problem.…”
Section: Methodsmentioning
confidence: 99%
“…Balancing between exploration and exploitation is solved by [7] which depends on Stratonovich's value of information which consists of two steps. The first one generates the base line of agent performance by measuring the achievable return of a policy in where there is no information regarding the states, afterward offsets these costs with a term that evaluates the average penalties when the state-action information is bounded above by a prescribed amount.…”
Section: Related Workmentioning
confidence: 99%