Fig. 1. Exploratory cohort selection in high-dimensional datasets can lead to selection bias-unintended side-effects in variable distributions-that may go unnoticed by the user. Our selection bias tracking system and detailed cohort comparison visualizations, deployed in a medical temporal event sequence visual analytics tool, include (a) a cohort provenance tree to keep track of created cohorts and indicate when selection bias may have occurred, (b-d) a suite of high-dimensional cohort comparison visualizations that employ hierarchical aggregation to display the differences between two cohorts in detail, and (e) data-type dependent comparisons of individual variable distributions.Abstract-The collection of large, complex datasets has become common across a wide variety of domains. Visual analytics tools increasingly play a key role in exploring and answering complex questions about these large datasets. However, many visualizations are not designed to concurrently visualize the large number of dimensions present in complex datasets (e.g. tens of thousands of distinct codes in an electronic health record system). This fact, combined with the ability of many visual analytics systems to enable rapid, ad-hoc specification of groups, or cohorts, of individuals based on a small subset of visualized dimensions, leads to the possibility of introducing selection bias-when the user creates a cohort based on a specified set of dimensions, differences across many other unseen dimensions may also be introduced. These unintended side effects may result in the cohort no longer being representative of the larger population intended to be studied, which can negatively affect the validity of subsequent analyses. We present techniques for selection bias tracking and visualization that can be incorporated into high-dimensional exploratory visual analytics systems, with a focus on medical data with existing data hierarchies. These techniques include: (1) tree-based cohort provenance and visualization, including a user-specified baseline cohort that all other cohorts are compared against, and visual encoding of cohort "drift", which indicates where selection bias may have occurred, and (2) a set of visualizations, including a novel icicle-plot based visualization, to compare in detail the per-dimension differences between the baseline and a user-specified focus cohort. These techniques are integrated into a medical temporal event sequence visual analytics tool. We present example use cases and report findings from domain expert user interviews.