Improving Reinforcement Learning with Human Assistance: An Argument for Human Subject Studies with HIPPO Gym

Taylor, Matthew E.; Nissen, Nicholas N.; Wang, Yuan; Navidi, Néda

doi:10.48550/arxiv.2102.02639

Cited by 1 publication

(1 citation statement)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This study showed that an information-gain objective decreasing uncertainty was well suited to propose effective questions. There has been a recent effort to provide testing environments for human-subject experimentation in reinforcement learning (Taylor et al, 2021) or active querying (Bıyık et al, 2022b). However, existing work has focused on simple feedback types and user interfaces, and more extensive human studies as possible future research directions have been highlighted.…”

Section: Related Workmentioning

confidence: 99%

VISITOR: Visual Interactive State Sequence Exploration for Reinforcement Learning

Metz

Bykovets

Joos

et al. 2023

Computer Graphics Forum

View full text Add to dashboard Cite

Understanding the behavior of deep reinforcement learning agents is a crucial requirement throughout their development. Existing work has addressed the identification of observable behavioral patterns in state sequences or analysis of isolated internal representations; however, the overall decision‐making of deep‐learning RL agents remains opaque. To tackle this, we present VISITOR, a visual analytics system enabling the analysis of entire state sequences, the diagnosis of singular predictions, and the comparison between agents. A sequence embedding view enables the multiscale analysis of state sequences, utilizing custom embedding techniques for a stable spatialization of the observations and internal states. We provide multiple layers: (1) a state space embedding, highlighting different groups of states inside the state‐action sequences, (2) a trajectory view, emphasizing decision points, (3) a network activation mapping, visualizing the relationship between observations and network activations, (4) a transition embedding, enabling the analysis of state‐to‐state transitions. The embedding view is accompanied by an interactive reward view that captures the temporal development of metrics, which can be linked directly to states in the embedding. Lastly, a model list allows for the quick comparison of models across multiple metrics. Annotations can be exported to communicate results to different audiences. Our two‐stage evaluation with eight experts confirms the effectiveness in identifying states of interest, comparing the quality of policies, and reasoning about the internal decision‐making processes.

show abstract

Section: Related Workmentioning

confidence: 99%