Thore Graepel scite author profile

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

show abstract

Mastering the game of Go without human knowledge

Silver

Schrittwieser

Simonyan

et al. 2017

Nature

6,937

4,167

View full text Add to dashboard Cite

*These authors contributed equally to this work.A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from selfplay. Here, we introduce an algorithm based solely on reinforcement learning, without human data, guidance, or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo.Much progress towards artificial intelligence has been made using supervised learning systems that are trained to replicate the decisions of human experts [1][2][3][4] . However, expert data is often expensive, unreliable, or simply unavailable. Even when reliable data is available it may impose a ceiling on the performance of systems trained in this manner 5 . In contrast, reinforcement learning systems are trained from their own experience, in principle allowing them to exceed human capabilities, and to operate in domains where human expertise is lacking. Recently, there has been rapid progress towards this goal, using deep neural networks trained by reinforcement learning. initially by supervised learning to accurately predict human expert moves, and was subsequently refined by policy-gradient reinforcement learning. The value network was trained to predict the winner of games played by the policy network against itself. Once trained, these networks were combined with a Monte-Carlo Tree Search (MCTS) [13][14][15] to provide a lookahead search, using the policy network to narrow down the search to high-probability moves, and using the value network (in conjunction with Monte-Carlo rollouts using a fast rollout policy) to evaluate positions in the tree. A subsequent version, which we refer to as AlphaGo Lee, used a similar approach (see Methods), and defeated Lee Sedol, the winner of 18 international titles, in March 2016.Our program, AlphaGo Zero, differs from AlphaGo Fan and AlphaGo Lee 12 in several important aspects. First and foremost, it is trained solely by self-play reinforcement learning, starting from random play, without any supervision or use of human data. Second, it only uses the black and white stones from the board as input features. Third, it uses a single neural network, rather than separate policy and value networks. Finally, it uses a simpler tree search that relies upon this single neural network to evaluate positions and samp...

show abstract

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

Silver

Hubert

Schrittwieser

et al. 2018

Science

2,416

1,596

View full text Add to dashboard Cite

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

show abstract

Private traits and attributes are predictable from digital records of human behavior

Kosinski

Stillwell

Graepel

2013

Proc. Natl. Acad. Sci. U.S.A.

2,126

1,401

View full text Add to dashboard Cite

We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes, detailed demographic profiles, and the results of several psychometric tests. The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/ linear regression to predict individual psychodemographic profiles from Likes. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait "Openness," prediction accuracy is close to the test-retest accuracy of a standard personality test. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy.social networks | computational social science | machine learning | big data | data mining | psychological assessment

show abstract

Mastering Atari, Go, chess and shogi by planning with a learned model

Schrittwieser

Antonoglou

Hubert

et al. 2020

Nature

945

694

View full text Add to dashboard Cite

Human-level performance in 3D multiplayer games with population-based reinforcement learning

Jaderberg

Czarnecki

Dunning

et al. 2019

Science

527

474

View full text Add to dashboard Cite

Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments (30, 40, 45,46,56) and two-player turn-based games (47,58,66). However, the realworld contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. In this work, we demonstrate for the first time that an agent can achieve human-level in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag (28), using only pixels and game points as input. These results were achieved by a novel two-tier optimisation process in which a population of independent RL agents are trained concurrently from thousands of parallel matches with agents playing in teams together and against each other on randomly generated environments. Each agent in the population learns its own internal reward signal to complement the sparse delayed reward from winning, and selects actions using a novel temporally hierarchical representation that enables the agent to reason at multiple timescales. During game-play, these agents display humanlike behaviours such as navigating, following, and defending based on a rich learned representation that is shown to encode high-level game knowledge. In an extensive tournament-style evaluation the trained agents exceeded the winrate of strong human players both as teammates and opponents, and proved far stronger than existing state-of-the-art agents. These results demonstrate a 1 arXiv:1807.01281v1 [cs.LG] 3 Jul 2018 significant jump in the capabilities of artificial agents, bringing us closer to the goal of human-level intelligence.We demonstrate how intelligent behaviour can emerge from training sophisticated new learning agents within complex multi-agent environments. End-to-end reinforcement learning methods (45, 46) have so far not succeeded in training agents in multi-agent games that combine team and competitive play due to the high complexity of the learning problem (7, 43) that arises from the concurrent adaptation of other learning agents in the environment. We approach this challenge by studying team-based multiplayer 3D first-person video games, a genre which is particularly immersive for humans (16) and has even been shown to improve a wide range of cognitive abilities (21). We focus specifically on a modified version (5) of Quake III Arena (28), the canonical multiplayer 3D first-person video game, whose game mechanics served as the basis for many subsequent games, and which has a thriving professional scene (1). The task we consider is the game mode Capture the Flag (CTF) on per game randomly generated maps of both indoor and outdoor theme ( Figure 1 (a,b)). Two opposing teams consisting of multiple individual players compete to capture each other's flags by strategically navigating, tagging, and evading opponents. The team with the greatest number of flag captures after five minutes wins. CTF is play...

show abstract

ML Confidential: Machine Learning on Encrypted Data

2013

View full text Add to dashboard Cite

Abstract. We demonstrate that, by using a recently proposed leveled homomorphic encryption scheme, it is possible to delegate the execution of a machine learning algorithm to a computing service while retaining confidentiality of the training and test data. Since the computational complexity of the homomorphic encryption scheme depends primarily on the number of levels of multiplications to be carried out on the encrypted data, we define a new class of machine learning algorithms in which the algorithm's predictions, viewed as functions of the input data, can be expressed as polynomials of bounded degree. We propose confidential algorithms for binary classification based on polynomial approximations to least-squares solutions obtained by a small number of gradient descent steps. We present experimental validation of the confidential machine learning pipeline and discuss the trade-offs regarding computational complexity, prediction accuracy and cryptographic security.

show abstract

Personality and patterns of Facebook usage

et al. 2012

View full text Add to dashboard Cite

We show how users' activity on Facebook relates to their personality, as measured by the standard Five Factor Model. Our dataset consists of the personality profiles and Facebook profile data of 180,000 users. We examine correlations between users' personality and the properties of their Facebook profiles such as the size and density of their friendship network, number uploaded photos, number of events attended, number of group memberships, and number of times user has been tagged in photos. Our results show significant relationships between personality traits and various features of Facebook profiles. We then show how multivariate regression allows prediction of the personality traits of an individual user given their Facebook profile. The best accuracy of such predictions is achieved for Extraversion and Neuroticism, the lowest accuracy is obtained for Agreeableness, with Openness and Conscientiousness lying in the middle.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Thore Graepel

Mastering the game of Go with deep neural networks and tree search

Mastering the game of Go without human knowledge

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

Private traits and attributes are predictable from digital records of human behavior

Mastering Atari, Go, chess and shogi by planning with a learned model

Human-level performance in 3D multiplayer games with population-based reinforcement learning

ML Confidential: Machine Learning on Encrypted Data

Personality and patterns of Facebook usage

Contact Info

Product

Resources

About