In this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. ALE provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players. ALE presents significant research challenges for reinforcement learning, model learning, model-based planning, imitation learning, transfer learning, and intrinsic motivation. Most importantly, it provides a rigorous testbed for evaluating and comparing approaches to these problems. We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning. In doing so, we also propose an evaluation methodology made possible by ALE, reporting empirical results on over 55 different games. All of the software, including the benchmark agents, is publicly available.
Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker, the quintessential game of imperfect information, is a long-standing challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect-information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated, with statistical significance, professional poker players in heads-up no-limit Texas hold'em. The approach is theoretically sound and is shown to produce strategies that are more difficult to exploit than prior approaches.
Poker is a family of games that exhibit imperfect information, where players do not have full knowledge of past events. Whereas many perfect information games have been solved (e.g., Connect Four and checkers), no nontrivial imperfect information game played competitively by humans has previously been solved. Here, we announce that heads-up limit Texas hold'em is now essentially weakly solved. Furthermore, this computation formally proves the common wisdom that the dealer in the game holds a substantial advantage. This result was enabled by a new algorithm, CFR + , which is capable of solving extensive-form games orders of magnitude larger than previously possible.Games have been intertwined with the earliest developments in computation, game theory, and artificial intelligence (AI). At the very conception of computing, Babbage had detailed plans for an "automaton" capable of playing tic-tac-toe and dreamt of his Analytical Engine playing chess (1). Both Turing (2) and Shannon (3) -on paper and in hardware, respectively -developed programs to play chess as a means of validating early ideas in computation and AI. For more than a half century, games have continued to act as testbeds for new ideas and the resulting successes have marked important milestones in the progress of AI. Examples include the checkers-playing computer program Chinook becoming the first to win a world championship title against humans (4), Deep Blue defeating Kasparov in chess (5), and Watson defeating Jennings and Rutter on Jeopardy! (6). However, defeating top human players is not the same as "solving" a game -that is, computing a game-theoretically optimal solution that is incapable of losing against any opponent in a fair game. Notable milestones in the advancement of AI have been achieved through solving games such as Connect Four (7) and checkers (8).Every nontrivial game played competitively by humans that has been solved to date is a perfect-information game (9). In perfect-information games, all players are informed of everything that has occurred in the game before making a decision. Chess, checkers, and backgammon are examples of perfect-information games. In imperfect-information games, players do 1
The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of different problem settings and it has been receiving increasing attention from the scientific community, leading to some high-profile success stories such as the much publicized Deep Q-Networks (DQN). In this article we take a big picture look at how the ALE is being used by the research community. We show how diverse the evaluation methodologies in the ALE have become with time, and highlight some key concerns when evaluating agents in the ALE. We use this discussion to present some methodological best practices and provide new benchmark results using these best practices. To further the progress in the field, we introduce a new version of the ALE that supports multiple game modes and provides a form of stochasticity we call sticky actions. We conclude this big picture look by revisiting challenges posed when the ALE was introduced, summarizing the state-of-the-art in various problems and highlighting problems that remain open.
In apprenticeship learning, the goal is to learn a policy in a Markov decision process that is at least as good as a policy demonstrated by an expert. The difficulty arises in that the MDP's true reward function is assumed to be unknown. We show how to frame apprenticeship learning as a linear programming problem, and show that using an off-the-shelf LP solver to solve this problem results in a substantial improvement in running time over existing methods -up to two orders of magnitude faster in our experiments. Additionally, our approach produces stationary policies, while all existing methods for apprenticeship learning output policies that are "mixed", i.e. randomized combinations of stationary policies. The technique used is general enough to convert any mixed policy to a stationary policy.
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay with two to five players and imperfect information. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques. 6 One such equilibrium occurs when players do not intentionally communicate information to other players, and ignore what other players tell them (historically called a pooling equilibrium in pure signalling games [15], or a babbling equilibrium in later work using cheap talk [16]). In this case, there is no incentive for a player to start communicating because they will be ignored, and there is no incentive to pay attention to other players because they are not communicating.7 In pure signalling games where actions are purely communicative, policies are often referred to as communication protocols. Though Hanabi is not such a pure signalling game, when we want to emphasize the communication properties of an agent's policy we will still refer to its communication protocol. 8 We use the word convention to refer to the parts of a communication protocol or policy that interrelate. Technically, these can be thought of as constraints on the policy to enact the convention.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.