Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation 2015
DOI: 10.1145/2739480.2754783
|View full text |Cite
|
Sign up to set email alerts
|

High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning

Abstract: SZ-Tetris, a restricted version of Tetris, is a difficult reinforcement learning task. Previous research showed that, similarly to the original Tetris, value function-based methods such as temporal difference learning, do not work well for SZ-Tetris. The best performance in this game was achieved by employing direct policy search techniques, in particular the cross-entropy method in combination with handcrafted features. Nonetheless, a simple heuristic hand-coded player scores even higher. Here we show that it… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 20 publications
0
8
0
Order By: Relevance
“…In this context, it is worth to emphasize the role of the (systematic) n-tuple networks for the overall result. Although they do not scale up to state spaces with large dimensionality [13], they have undeniable advantages for state spaces of moderate sizes and dimensions. They provide nonlinear transformations and computational efficiency.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…In this context, it is worth to emphasize the role of the (systematic) n-tuple networks for the overall result. Although they do not scale up to state spaces with large dimensionality [13], they have undeniable advantages for state spaces of moderate sizes and dimensions. They provide nonlinear transformations and computational efficiency.…”
Section: Discussionmentioning
confidence: 99%
“…The roots of reinforcement learning [14] in games can be tracked down to Samuel's famous work on Checkers [25], but it was not popularized until the Tesauro's TD-Gammon, a master-level program for Backgammon [34] obtained by temporal difference learning. Various reinforcement learning algorithms were applied with success to other games such as Othello [6], [15], [32], Connect 4 [37], Tetris [27], [13], or Atari video games [4]. Recently, AlphaGo used reinforcement learning among others to determine weights for a deep artificial neural network to beat a professional Go player [28].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We conclude that the part propagating the final reward of the other player back to the other player's previous state is vitally important. 7 If we analyze the no-FARL-agent we find that it has only 0.9% active weights while the good-working TD-FARL agent has 8% active weights. This comes because the other player (that is the one who loses the game since the current player created a winning state) has never the negative reward propagated back to previous states of that other player.…”
Section: Connectfourmentioning
confidence: 98%
“…policies do not adapt while interacting with the application. Most evolutionary methods take this form, with neuroevolutionary algorithms such as CoSyNE [8], NEAT [24] or CMA-ES (for weight optimization) [10] representing specific examples.…”
Section: Reinforcement Learningmentioning
confidence: 99%