2015
DOI: 10.1007/s11633-015-0893-y
|View full text |Cite
|
Sign up to set email alerts
|

Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey

Abstract: Tremendous amount of data are being generated and saved in many complex engineering and social systems every day. It is significant and feasible to utilize the big data to make better decisions by machine learning techniques. In this paper, we focus on batch reinforcement learning (RL) algorithms for discounted Markov decision processes (MDPs) with large discrete or continuous state spaces, aiming to learn the best possible policy given a fixed amount of training data. The batch RL algorithms with handcrafted … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
21
0
1

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 40 publications
(22 citation statements)
references
References 64 publications
0
21
0
1
Order By: Relevance
“…Regarding the embedded methods, the most prominent instance is the regularized least-squares policy [45,46,47]. This and similar methods are tied to a specific learning algorithm and, therefore, breach the restriction of an applicability independent from the control algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…Regarding the embedded methods, the most prominent instance is the regularized least-squares policy [45,46,47]. This and similar methods are tied to a specific learning algorithm and, therefore, breach the restriction of an applicability independent from the control algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…In [22,23] feature selection is used to obtain lowerdimensional BS representation. However, most of the algorithms are tied to specific Reinforcement Learning algorithms such as LSPI, and cannot be generalized to other policy models.…”
Section: Related Workmentioning
confidence: 99%
“…Увеличить скорость и качество обучения может формирование информативных признаков [11]. Ис-пользование вектора признаков вместо набора пик-селей позволит интеллектуальному агенту ориен-тироваться лишь на значимые характеристики входных данных и, следовательно, быстрее обу-чаться.…”
Section: редукция задачи посредством распознавания образовunclassified