What Did My AI Learn? How Data Scientists Make Sense of Model Behavior

Cabrera, Ángel Alexander; Ribeiro, Marco Túlio; Lee, Bongshin; DeLine, Rob; Perer, Adam; Drucker, Steven M.

doi:10.1145/3542921

Cited by 20 publications

(13 citation statements)

References 73 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Detecting key phrases in demonstrations. While key phrase extraction in general may require domain knowledge [8,42,65], for text transformation we can leverage the signal present in the relationships between input and output, i.e., in which parts of the input are perturbed or retained. For example, "today" is retained in the output of both "Took a photo today. "…”

Section: Identifying Patterns With Key Phrase Clusteringmentioning

confidence: 99%

ScatterShot: Interactive In-context Example Curation for Text Transformation

Weld

et al. 2023

Proceedings of the 28th International Conference on Intelligent User Interfaces

View full text Add to dashboard Cite

The in-context learning capabilities of LLMs like GPT-3 allow annotators to customize an LLM to their specific tasks with a small number of examples. However, users tend to include only the most obvious patterns when crafting examples, resulting in underspecified in-context functions that fall short on unseen cases. Further, it is hard to know when "enough" examples have been included even for known patterns. In this work, we present ScatterShot, an interactive system for building high-quality demonstration sets for in-context learning. ScatterShot iteratively slices unlabeled data into task-specific patterns, samples informative inputs from underexplored or not-yet-saturated slices in an active learning manner, and helps users label more efficiently with the help of an LLM and the current example set. In simulation studies on two text perturbation scenarios, ScatterShot sampling improves the resulting few-shot functions by 4-5 percentage points over random sampling, with less variance as more examples are added. In a user study, ScatterShot greatly helps users in covering different patterns in the input space and labeling in-context examples more efficiently, resulting in better in-context learning and less user effort.

show abstract

Section: Identifying Patterns With Key Phrase Clusteringmentioning

confidence: 99%

ScatterShot: Interactive In-context Example Curation for Text Transformation

Weld

et al. 2023

Proceedings of the 28th International Conference on Intelligent User Interfaces

View full text Add to dashboard Cite

show abstract

“…To detect and mitigate these important issues, the ML community uses more fine-grained evaluation approaches, often termed behavioral evaluation [10,47]. Inspired by requirements engineering in software engineering, behavioral evaluation focuses on defining and testing the capabilities of an ML system, its expected behavior on a specification of requirements [45,60].…”

Section: Behavioral Evaluation Of Machine Learningmentioning

confidence: 99%

“…There are numerous ML evaluation systems for discovering, validating, and tracking model behaviors [10,47]. The tools use techniques such as visualizations and data transformations to discover behaviors like fairness concerns and edge cases.…”

Section: Model Evaluation Approachesmentioning

confidence: 99%

“…Enumerating what behaviors a model should have or what types of errors it could produce requires collaboration between stakeholders such as ML engineers, designers, and domain experts [40,54]. Behavioral evaluation is also a continuous, iterative process, as practitioners update their models to fix limitations or add features while ensuring that new failures are not introduced [10].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning

Cabrera,

Fu,

Bertucci

et al. 2023

Preprint

View full text Add to dashboard Cite

Figure 1: zeno is a framework for behavioral evaluation of machine learning (ML) models. It has two components, a Python API and an interactive UI. The API is used to generate information such as model outputs and metrics. Users then interact with the UI to see metrics, create slices, and write unit tests. In this toy example, a user is evaluating a cat and dog classifier. They see that the model has lower accuracy for dogs with pointy ears, and create a test expecting the slice accuracy to be higher than 70%.

show abstract

“…Unlike existing work, our study proposes an interactive workflow of exploring concepts for the purpose of inspecting systematic errors and spurious concept associations behind them. Similar to [11], our human-in-the-loop workflow aims to promote the sensemaking of practitioners specifically in the problem of systematic errors where they can iteratively work on subsetting, contrasting patterns in instances, and hypothesizing spurious associations.…”

Section: Understanding Model With Concept Interpretabilitymentioning

confidence: 99%

ESCAPE: Countering Systematic Errors from Machine’s Blind Spots via Interactive Visual Analysis

Ahn

Lin

et al. 2023

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

Trovato and Tobin, et al. approaches, relative concept association to better quantify the associations between a concept and instances, and debias method to mitigate spurious associations. We demonstrate the utility of our proposed ESCAPE system and statistical measures through extensive evaluation including quantitative experiments, usage scenarios, expert interviews, and controlled user experiments. CCS CONCEPTS• Human-centered computing → Visualization toolkits; Visualization systems and tools.

show abstract

What Did My AI Learn? How Data Scientists Make Sense of Model Behavior

Cited by 20 publications

References 73 publications

ScatterShot: Interactive In-context Example Curation for Text Transformation

ScatterShot: Interactive In-context Example Curation for Text Transformation

Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning

ESCAPE: Countering Systematic Errors from Machine’s Blind Spots via Interactive Visual Analysis

Contact Info

Product

Resources

About