2022 ACM Conference on Fairness, Accountability, and Transparency 2022
DOI: 10.1145/3531146.3533233
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation Gaps in Machine Learning Practice

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(13 citation statements)
references
References 97 publications
1
5
0
Order By: Relevance
“…Example metrics that participants proposed included error rates for determinations that a piece of evidence was inconclusive and ranges for the possible values that false positive software outputs could take on (Section 5.2.3). Together, these findings echo gaps between evaluation design and real-world use contexts highlighted in prior work (e.g., [42,72,81,92]), and, importantly, demonstrate the valuable insights that public defenders develop through their everyday encounters with CFS in the U.S. criminal legal system, further motivating growing efforts in HCI to engage downstream stakeholders in designing performance evaluations of AI systems (e.g., [28,55,76,81]).…”
Section: Contextualize Design Of Performance Evaluations In Real Worl...supporting
confidence: 68%
“…Example metrics that participants proposed included error rates for determinations that a piece of evidence was inconclusive and ranges for the possible values that false positive software outputs could take on (Section 5.2.3). Together, these findings echo gaps between evaluation design and real-world use contexts highlighted in prior work (e.g., [42,72,81,92]), and, importantly, demonstrate the valuable insights that public defenders develop through their everyday encounters with CFS in the U.S. criminal legal system, further motivating growing efforts in HCI to engage downstream stakeholders in designing performance evaluations of AI systems (e.g., [28,55,76,81]).…”
Section: Contextualize Design Of Performance Evaluations In Real Worl...supporting
confidence: 68%
“…Metcalf et al [136], drawing from fields adjacent to ML, found that algorithmic impact assessments, intended to highlight risks of AI system deployment, can instead be co-opted by firms developing such systems to further their interests. Meanwhile, recent work has also explored and critiqued evaluation practices of AI systems more broadly, highlighting the implications of decontextualization when evaluating AI systems [103,135] and raising attention to the risks of corporate capture when private actors evaluate their own systems [219]. Our aim is to support improvements in the design and development of RAI tools through an analysis of existing evaluation practices for RAI tools.…”
Section: Evaluating Rai Toolsmentioning
confidence: 99%
“…In natural language processing (NLP), for instance, researchers have surveyed existing NLP model evaluation methods, finding no standardised evaluation practices [79,220]. Relatedly, efforts are underway to develop standards for evaluation of ML applications [88] and models [103]. Our focus complements these efforts, by attending to the evaluation of interventions in ML production, rather than the evaluation of the outputs of ML production, such as trained models or new AI systems.…”
Section: Evaluation Goals and Approaches Outside Of Hcimentioning
confidence: 99%
“…can vary substantially across disciplines [34,77,98]. In practice, these properties lend themselves to communication breakdowns and ineffective collaboration around AI fairness [27,41,56,67,68]. Passi and Barocas found that misalignments around problem formulation between data scientists and business teams can contribute to fundamental fairness issues from the early problem formulation phases of a project [67].…”
Section: Background and Related Workmentioning
confidence: 99%
“…For instance, abstraction has been highlighted as an important skill for collaborating and communicating in software engineering and data analysis in cross-functional teams [4,55,64]-although with the risk of losing the nuance of particular contexts [cf. 41,77]. However, in the context of collaboration on AI fairness work, these abstractions that were intended to facilitate conversations across roles often resulted in other team members not fully understanding and appreciating the labor hidden behind the efforts individuals invested in enabling the collaboration in AI fairness 4.3.…”
Section: Making Invisible Labor Visible and Valuablementioning
confidence: 99%