2021
DOI: 10.1016/j.compeleceng.2021.107537
|View full text |Cite
|
Sign up to set email alerts
|

Hardware implementation of the upper confidence-bound algorithm for reinforcement learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 16 publications
0
1
0
Order By: Relevance
“…As for S, it determines the next point to query by selecting the most promising candidate. Normally, three acquisition functions are widely used, which are the maximum probability of improvement (MPI) [35], expected improvement (EI) [36], and upper confidence bound (UCB) [37]. The disadvantage of MPI is that it only chooses the points with highly confident to query, hence there is little improvement of the model.…”
Section: Forward Layermentioning
confidence: 99%
“…As for S, it determines the next point to query by selecting the most promising candidate. Normally, three acquisition functions are widely used, which are the maximum probability of improvement (MPI) [35], expected improvement (EI) [36], and upper confidence bound (UCB) [37]. The disadvantage of MPI is that it only chooses the points with highly confident to query, hence there is little improvement of the model.…”
Section: Forward Layermentioning
confidence: 99%