High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning

Jaśkowski, Wojciech; Szubert, Marcin; Liskowski, Paweł; Krawiec, Krzysztof

doi:10.1145/2739480.2754783

Cited by 13 publications

(8 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this context, it is worth to emphasize the role of the (systematic) n-tuple networks for the overall result. Although they do not scale up to state spaces with large dimensionality [13], they have undeniable advantages for state spaces of moderate sizes and dimensions. They provide nonlinear transformations and computational efficiency.…”

Section: Discussionmentioning

confidence: 99%

“…The roots of reinforcement learning [14] in games can be tracked down to Samuel's famous work on Checkers [25], but it was not popularized until the Tesauro's TD-Gammon, a master-level program for Backgammon [34] obtained by temporal difference learning. Various reinforcement learning algorithms were applied with success to other games such as Othello [6], [15], [32], Connect 4 [37], Tetris [27], [13], or Atari video games [4]. Recently, AlphaGo used reinforcement learning among others to determine weights for a deep artificial neural network to beat a professional Go player [28].…”

Section: Related Workmentioning

confidence: 99%

“…1) n-Tuple Network: One particular type of function approximator is n-tuple network [15], which has been recently successfully applied to board games such as Othello [7], [17], [33], [12], Connect 4 [37], or Tetris [13].…”

Section: B Function Approximationmentioning

confidence: 99%

See 2 more Smart Citations

Mastering 2048 With Delayed Temporal Coherence Learning, Multistage Weight Promotion, Redundant Encoding, and Carousel Shaping

Jaśkowski

2018

IEEE Trans. Games

View full text Add to dashboard Cite

Abstract-2048 is an engaging single-player nondeterministic video puzzle game, which, thanks to the simple rules and hard-to-master gameplay, has gained massive popularity in recent years. As 2048 can be conveniently embedded into the discrete-state Markov decision processes framework, we treat it as a testbed for evaluating existing and new methods in reinforcement learning. With the aim to develop a strong 2048 playing program, we employ temporal difference learning with systematic n-tuple networks. We show that this basic method can be significantly improved with temporal coherence learning, multi-stage function approximator with weight promotion, carousel shaping, and redundant encoding. In addition, we demonstrate how to take advantage of the characteristics of the n-tuple network, to improve the algorithmic effectiveness of the learning process by delaying the (decayed) update and applying lock-free optimistic parallelism to effortlessly make advantage of multiple CPU cores. This way, we were able to develop the best known 2048 playing program to date, which confirms the effectiveness of the introduced methods for discrete-state Markov decision problems.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Mastering 2048 With Delayed Temporal Coherence Learning, Multistage Weight Promotion, Redundant Encoding, and Carousel Shaping

Jaśkowski

2018

IEEE Trans. Games

View full text Add to dashboard Cite

show abstract

“…We conclude that the part propagating the final reward of the other player back to the other player's previous state is vitally important. 7 If we analyze the no-FARL-agent we find that it has only 0.9% active weights while the good-working TD-FARL agent has 8% active weights. This comes because the other player (that is the one who loses the game since the current player created a winning state) has never the negative reward propagated back to previous states of that other player.…”

Section: Connectfourmentioning

confidence: 98%

Final Adaptation Reinforcement Learning for N-Player Games

Konen¹,

Bagheri²

2021

Preprint

View full text Add to dashboard Cite

This paper covers n-tuple-based reinforcement learning (RL) algorithms for games. We present new algorithms for TD-, SARSA-and Q-learning which work seamlessly on various games with arbitrary number of players. This is achieved by taking a player-centered view where each player propagates his/her rewards back to previous rounds. We add a new element called Final Adaptation RL (FARL) to all these algorithms. Our main contribution is that FARL is a vitally important ingredient to achieve success with the player-centered view in various games. We report results on seven board games with 1, 2 and 3 players, including Othello, ConnectFour and Hex. In most cases it is found that FARL is important to learn a near-perfect playing strategy. All algorithms are available in the GBG framework on GitHub.

show abstract

“…policies do not adapt while interacting with the application. Most evolutionary methods take this form, with neuroevolutionary algorithms such as CoSyNE [8], NEAT [24] or CMA-ES (for weight optimization) [10] representing specific examples.…”

Section: Reinforcement Learningmentioning

confidence: 99%

Discovering Rubik's Cube Subgroups using Coevolutionary GP

Smith

Kelly

Heywood

2016

Proceedings of the Genetic and Evolutionary Computation Conference 2016

View full text Add to dashboard Cite

This work reports on an approach to direct policy discovery (a form of reinforcement learning) using genetic programming (GP) for the 3 × 3 × 3 Rubik's Cube. Specifically, a synthesis of two approaches is proposed: 1) a previous group theoretic formulation is used to suggest a sequence of objectives for developing solutions to different stages of the overall task; and 2) a hierarchical formulation of GP policy search is utilized in which policies adapted for an earlier objective are explicitly transferred to aid the construction of policies for the next objective. The resulting hierarchical organization of policies explicitly demonstrates task decomposition and policy reuse. Algorithmically, the process makes use of a recursive call to a common approach for maintaining a diverse population of GP individuals and then learns how to reuse subsets of programs (policies) developed against the earlier objective. Other than the two objectives, we do not explicitly identify how to decompose the task or mark specific policies for reuse. Moreover, at the end of evolution we return a population solving 100% of 17,675,698 different initial Cubes for the two objectives currently in use. CCS Concepts •Computing methodologies → Sequential decision making; Genetic programming;

show abstract

High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning

Cited by 13 publications

References 20 publications

Mastering 2048 With Delayed Temporal Coherence Learning, Multistage Weight Promotion, Redundant Encoding, and Carousel Shaping

Mastering 2048 With Delayed Temporal Coherence Learning, Multistage Weight Promotion, Redundant Encoding, and Carousel Shaping

Final Adaptation Reinforcement Learning for N-Player Games

Discovering Rubik's Cube Subgroups using Coevolutionary GP

Contact Info

Product

Resources

About