The Limits of Abstract Evaluation Metrics

Olteanu, Alexandra; Talamadupula, Kartik; Varshney, Kush R.

doi:10.1145/3091478.3098871

Cited by 35 publications

(27 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Second, effectively identifying existing biases and other harmful blind spots along a data analysis pipeline further requires better auditing and evaluation frameworks, as well as metrics based on the semantics of the problem, rather than allowing them to be abstract or generic (Wagstaff, 2012). Users' perceptions and assessments of performance may also significantly diverge from that suggested by statistical metrics (Lee and Baykal, 2017;Olteanu et al, 2017a). In other words, it is often unclear what is being evaluated (section 8): e.g., is the performance or outcome of interest directly observable or measurable?…”

Section: A Trending Skepticism Toward Easy Answersmentioning

confidence: 99%

“…In social media research, the number of posts has been used as a proxy metric for the interest in a topic (Chen et al, 2010 ); yet, while this number may reflect production patterns, it may not reflect how much content on the topic users read (as seen in section 5.1). In the context of detecting hate speech online, Olteanu et al ( 2017a ) found that even when a given performance metric is fixed (e.g., precision), user perceptions of the output quality may vary based on various user characteristics. Finally, in some cases, metrics may themselves be designed using a statistical model, subject to the same biases presented in section 7.3 (Diaz, 2016 ).…”

Section: Issues With the Evaluation And Interpretation Of Findingsmentioning

confidence: 99%

“…Though the goal of an assessment task is to provide human input, underspecification or appeal to subjective judgment can introduce unintended biases that are often hard to detect. In fact, for many annotations tasks, the characteristics of those that do the annotations can significantly influence how they annotate (Olteanu et al, 2017a ; Patton et al, 2019 ).…”

Section: Issues Introduced While Processing Datamentioning

confidence: 99%

See 2 more Smart Citations

Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

et al. 2019

Self Cite

View full text Add to dashboard Cite

Social data in digital form-including user-generated content, expressed or implicit relations between people, and behavioral traces-are at the core of popular applications and platforms, driving the research agenda of many researchers. The promises of social data are many, including understanding "what the world thinks" about a social issue, brand, celebrity, or other entity, as well as enabling better decision-making in a variety of fields including public policy, healthcare, and economics. Many academics and practitioners have warned against the naïve usage of social data. There are biases and inaccuracies occurring at the source of the data, but also introduced during processing. There are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked. This paper recognizes the rigor with which these issues are addressed by different researchers varies across a wide range. We identify a variety of menaces in the practices around social data use, and organize them in a framework that helps to identify them. "For your own sanity, you have to remember that not all problems can be solved. Not all problems can be solved, but all problems can be illuminated."-Ursula Franklin 1

show abstract

Section: A Trending Skepticism Toward Easy Answersmentioning

confidence: 99%

Section: Issues With the Evaluation And Interpretation Of Findingsmentioning

confidence: 99%

Section: Issues Introduced While Processing Datamentioning

confidence: 99%

See 1 more Smart Citation

Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…Also, some metric choices do not reflect the true performance of the proposed methods. (Olteanu et al, 2017) argues for evaluation metrics that are directly proportional to user perception of correctness, thus more humancentered.…”

Section: Varying Preprocessing Steps Andmentioning

confidence: 99%

In Data We Trust: A Critical Analysis of Hate Speech Detection Datasets

Madukwe¹,

Gao²,

Xue³

2020

Proceedings of the Fourth Workshop on Online Abuse and Harms

View full text Add to dashboard Cite

Recently, a few studies have discussed the limitations of datasets collected for the task of detecting hate speech from different viewpoints. We intend to contribute to the conversation by providing a consolidated overview of these issues pertaining to the data that debilitate research in this area. Specifically, we discuss how the varying pre-processing steps and the format for making data publicly available result in highly varying datasets that make an objective comparison between studies difficult and unfair. There is currently no study (to the best of our knowledge) focused on comparing the attributes of existing datasets for hate speech detection, outlining their limitations and recommending approaches for future research. This work intends to fill that gap and become the one-stop shop for information regarding hate speech datasets.

show abstract

“…With advances in computing, networking, and sensing technologies, cyber-physical systems have been deployed in various safety-critical settings such as aerospace, energy, transportation, and healthcare. The increasing complexity and connectivity of these systems, the tight coupling between their cyber and physical components, and the inevitable involvement of human operators in their supervision and control has introduced significant challenges in ensuring system reliability 11 and safety while maintaining the expected performance. Cyber-physical systems continuously interact with the physical world and human operators in real-time.…”

Section: A Cyber-physical Systemsmentioning

confidence: 99%

On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products

2017

Self Cite

View full text Add to dashboard Cite

Machine learning algorithms increasingly influence our decisions and interact with us in all parts of our daily lives. Therefore, just as we consider the safety of power plants, highways, and a variety of other engineered socio-technical systems, we must also take into account the safety of systems involving machine learning. Heretofore, the definition of safety has not been formalized in a machine learning context. In this article, we do so by defining machine learning safety in terms of risk, epistemic uncertainty, and the harm incurred by unwanted outcomes. We then use this definition to examine safety in all sorts of applications in cyber-physical systems, decision sciences, and data products. We find that the foundational principle of modern statistical machine learning, empirical risk minimization, is not always a sufficient objective. We discuss how four different categories of strategies for achieving safety in engineering, including inherently safe design, safety reserves, safe fail, and procedural safeguards can be mapped to a machine learning context. We then discuss example techniques that can be adopted in each category, such as considering interpretability and causality of predictive models, objective functions beyond expected prediction accuracy, human involvement for labeling difficult or rare examples, and user experience design of software and open data.

show abstract

The Limits of Abstract Evaluation Metrics

Cited by 35 publications

References 3 publications

Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

In Data We Trust: A Critical Analysis of Hate Speech Detection Datasets

On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products

Contact Info

Product

Resources

About