Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset

Zhang, Ruohan; Walshe, Calen; Liu, Zhuode; Guan, Lin; Muller, Karl; Whritner, Jake A.; Zhang, Luxin; Hayhoe, Mary; Ballard, Dana H.

doi:10.48550/arxiv.1903.06754

Cited by 9 publications

(22 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Average return represents the long-term rewards of two policies, while behavior matching characterizes the behavioural similarity of two policies. We note that similar metrics are also used for attention-guided learning in recent work (Zhang et al, 2019).…”

Section: Evaluationsmentioning

confidence: 99%

Self-Supervised Discovering of Interpretable Features for Reinforcement Learning

Shi¹,

Huang²,

Song³

et al. 2020

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning (RL) has recently led to many breakthroughs on a range of complex control tasks. However, the agent's decision-making process is generally not transparent. The lack of interpretability hinders the applicability of RL in safety-critical scenarios. In this paper, we propose a self-supervised interpretable framework, which employs a self-supervised interpretable network (SSINet) to discover and locate fine-grained causal features that constitute most evidence for the agent's decisions. We verify and evaluate our method on several Atari 2600 games as well as Duckietown. The results show that our method renders causal explanations and empirical evidences about how the agent makes decisions and why the agent performs well or badly. Moreover, our method is a flexible explanatory module that can be applied to most vision-based RL agents. Overall, our method provides valuable insight into interpretable vision-based RL.

show abstract

Section: Evaluationsmentioning

confidence: 99%

Self-Supervised Discovering of Interpretable Features for Reinforcement Learning

Shi¹,

Huang²,

Song³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…In addition, if we gather BC data from human players, the recorded actions are subject to delays from human-reflexes: if something surprising happens in an image, average humans react to this with a split-second delay. This action was supposed to be associated with the surprising event, but instead it will be recorded few frames later, associated with possibly a wrong observation and leading to state-action mismatch [14].…”

Section: B Challenges Of End-to-end Control Of Video Gamesmentioning

confidence: 99%

“…[15], and "how the state-action mismatch from human reflexes affects the results?" [14]. The former sheds light on if we should gather data from only few experts, or should we use data from many different players.…”

Section: Research Questions and Experimental Setupmentioning

confidence: 99%

See 1 more Smart Citation

Benchmarking End-to-End Behavioural Cloning on Video Games

Kanervisto¹,

Pussinen²,

Hautamäki³

2020

Preprint

View full text Add to dashboard Cite

Behavioural cloning, where a computer is taught to perform a task based on demonstrations, has been successfully applied to various video games and robotics tasks, with and without reinforcement learning. This also includes end-to-end approaches, where a computer plays a video game like humans do: by looking at the image displayed on the screen, and sending keystrokes to the game. As a general approach to playing video games, this has many inviting properties: no need for specialized modifications to the game, no lengthy training sessions and the ability to re-use the same tools across different games. However, related work includes game-specific engineering to achieve the results. We take a step towards a general approach and study the general applicability of behavioural cloning on twelve video games, including six modern video games (published after 2010), by using human demonstrations as training data. Our results show that these agents cannot match humans in raw performance but can learn human-like behaviour. We also demonstrate how the quality of the data matters, and how recording data from humans is subject to a state-action mismatch, due to human reflexes.

show abstract

“…Learning an attention mechanism that limits these choices could potentially result in much better performance. Humans have a remarkable ability to selectively pay attention to certain parts of the visual input (Judd et al, 2009;Borji et al, 2012), gathering relevant information, and sequentially combining their observations to build representations across different timescales (Hayhoe and Ballard, 2005;Zhang et al, 2019), which plays an important role in guiding further perception and action (Nobre and Stokes, 2019;Badman et al, 2020). In this paper, we explore ideas for endowing reinforcement learning (RL) agents with these type of capabilities.…”

Section: Introductionmentioning

confidence: 99%

The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning

Nica¹,

Khetarpal²,

Precup³

2022

Preprint

View full text Add to dashboard Cite

Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we model "affordances" through an attention mechanism that limits the available choices of temporally extended options. We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. We identify and empirically illustrate the settings in which the paradox of choice arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.

show abstract

Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset

Cited by 9 publications

References 38 publications

Self-Supervised Discovering of Interpretable Features for Reinforcement Learning

Self-Supervised Discovering of Interpretable Features for Reinforcement Learning

Benchmarking End-to-End Behavioural Cloning on Video Games

The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning

Contact Info

Product

Resources

About