Discovering and Validating AI Errors With Crowdsourced Failure Reports

Cabrera, Ángel Alexander; Druck, Abraham J.; Hong, Jason I.; Perer, Adam

doi:10.1145/3479569

Cited by 39 publications

(27 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, designers might develop auditing interfaces that automatically surface potentially important instances for everyday auditors to examine further. As a foundation for such interfaces, it may be possible to build upon emerging algorithmic techniques for crowd-in-the-loop detection of "unknown unknowns" in ML models (e.g., [6,56,59,64,87,101]), which automatically surface cases that are more likely to be mislabelled and/or misclassified. These methods focus on surfacing regions of a model's error space in which the model is highly confident yet incorrect [56].…”

Section: Design For Algorithmic Guidancementioning

confidence: 99%

Everyday Algorithm Auditing: Understanding the Power of Everyday Users in Surfacing Harmful Algorithmic Behaviors

Shen

DeVos

Eslami

et al. 2021

Proc. ACM Hum.-Comput. Interact.

View full text Add to dashboard Cite

A growing body of literature has proposed formal approaches to audit algorithmic systems for biased and harmful behaviors. While formal auditing approaches have been greatly impactful, they often suffer major blindspots, with critical issues surfacing only in the context of everyday use once systems are deployed. Recent years have seen many cases in which everyday users of algorithmic systems detect and raise awareness about harmful behaviors that they encounter in the course of their everyday interactions with these systems. However, to date little academic attention has been granted to these bottom-up, user-driven auditing processes. In this paper, we propose and explore the concept of everyday algorithm auditing, a process in which users detect, understand, and interrogate problematic machine behaviors via their day-to-day interactions with algorithmic systems. We argue that everyday users are powerful in surfacing problematic machine behaviors that may elude detection via more centrally-organized forms of auditing, regardless of users' knowledge about the underlying algorithms. We analyze several real-world cases of everyday algorithm auditing, drawing lessons from these cases for the design of future platforms and tools that facilitate such auditing behaviors. Finally, we discuss work that lies ahead, toward bridging the gaps between formal auditing approaches and the organic auditing behaviors that emerge in everyday use of algorithmic systems.CCS Concepts: • Human-centered computing → Human computer interaction (HCI); Empirical studies in HCI .

show abstract

Section: Design For Algorithmic Guidancementioning

confidence: 99%

Everyday Algorithm Auditing: Understanding the Power of Everyday Users in Surfacing Harmful Algorithmic Behaviors

Shen

DeVos

Eslami

et al. 2021

Proc. ACM Hum.-Comput. Interact.

View full text Add to dashboard Cite

show abstract

“…Adding data [16,45,51,121] Relabeling data [76] Reweighting data [12,64,137] Collecting expert labels [98] Passive observation [69,84,118]…”

Section: Active Data Collectionmentioning

confidence: 99%

“…Recently, active learning has been studied alongside model transparency, specifically using explanations to assist experts with choosing which points to add to D [51]. Cabrera et al [16] propose an extensive visual analytics system that allows experts to verify and produce examples of crowd-sourced errors, which can be thought of as additional data. Passive observation.…”

Section: Observation To Datasetmentioning

confidence: 99%

Perspectives on Incorporating Expert Feedback into Model Updates

Chen¹,

Bhatt²,

Heidari³

et al. 2022

Preprint

View full text Add to dashboard Cite

Machine learning (ML) practitioners are increasingly tasked with developing models that are aligned with non-technical experts' values and goals. However, there has been insufficient consideration of how practitioners should translate domain expertise into ML updates. In this paper, we consider how to capture interactions between practitioners and experts systematically. We devise a taxonomy to match expert feedback types with practitioner updates. A practitioner may receive feedback from an expert at the observationor domain-level, and convert this feedback into updates to the dataset, loss function, or parameter space. We review existing work from ML and human-computer interaction to describe this feedback-update taxonomy, and highlight the insufficient consideration given to incorporating feedback from non-technical experts. We end with a set of open questions that naturally arise from our proposed taxonomy and subsequent survey.

show abstract

“…While data labeling represents the most common use of crowdsourcing in regard to training and evaluating machine learning models, human intelligence can be tapped in a much wider and more creative variety of ways. For example, the crowd might verify output from machine learning models, identify, and categorize blind spots (Attenberg et al, 2011 ; Vandenhof, 2019 ) and other failure modes (Cabrera et al, 2021 ), and suggest useful features for a machine learning classifier (Cheng and Bernstein, 2015 ).…”

Section: Motivation and Backgroundmentioning

confidence: 99%

In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers

Pradhan

Schaekermann

Lease

2022

Front. Artif. Intell.

View full text Add to dashboard Cite

We propose a novel three-stage FIND-RESOLVE-LABEL workflow for crowdsourced annotation to reduce ambiguity in task instructions and, thus, improve annotation quality. Stage 1 (FIND) asks the crowd to find examples whose correct label seems ambiguous given task instructions. Workers are also asked to provide a short tag that describes the ambiguous concept embodied by the specific instance found. We compare collaborative vs. non-collaborative designs for this stage. In Stage 2 (RESOLVE), the requester selects one or more of these ambiguous examples to label (resolving ambiguity). The new label(s) are automatically injected back into task instructions in order to improve clarity. Finally, in Stage 3 (LABEL), workers perform the actual annotation using the revised guidelines with clarifying examples. We compare three designs using these examples: examples only, tags only, or both. We report image labeling experiments over six task designs using Amazon's Mechanical Turk. Results show improved annotation accuracy and further insights regarding effective design for crowdsourced annotation tasks.

show abstract

Discovering and Validating AI Errors With Crowdsourced Failure Reports

Cited by 39 publications

References 63 publications

Everyday Algorithm Auditing: Understanding the Power of Everyday Users in Surfacing Harmful Algorithmic Behaviors

Everyday Algorithm Auditing: Understanding the Power of Everyday Users in Surfacing Harmful Algorithmic Behaviors

Perspectives on Incorporating Expert Feedback into Model Updates

In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers

Contact Info

Product

Resources

About