Proceedings of the 2017 ACM on Web Science Conference 2017
DOI: 10.1145/3091478.3098871
|View full text |Cite
|
Sign up to set email alerts
|

The Limits of Abstract Evaluation Metrics

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
26
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 35 publications
(27 citation statements)
references
References 3 publications
1
26
0
Order By: Relevance
“…Second, effectively identifying existing biases and other harmful blind spots along a data analysis pipeline further requires better auditing and evaluation frameworks, as well as metrics based on the semantics of the problem, rather than allowing them to be abstract or generic (Wagstaff, 2012). Users' perceptions and assessments of performance may also significantly diverge from that suggested by statistical metrics (Lee and Baykal, 2017;Olteanu et al, 2017a). In other words, it is often unclear what is being evaluated (section 8): e.g., is the performance or outcome of interest directly observable or measurable?…”
Section: A Trending Skepticism Toward Easy Answersmentioning
confidence: 99%
See 2 more Smart Citations
“…Second, effectively identifying existing biases and other harmful blind spots along a data analysis pipeline further requires better auditing and evaluation frameworks, as well as metrics based on the semantics of the problem, rather than allowing them to be abstract or generic (Wagstaff, 2012). Users' perceptions and assessments of performance may also significantly diverge from that suggested by statistical metrics (Lee and Baykal, 2017;Olteanu et al, 2017a). In other words, it is often unclear what is being evaluated (section 8): e.g., is the performance or outcome of interest directly observable or measurable?…”
Section: A Trending Skepticism Toward Easy Answersmentioning
confidence: 99%
“…In social media research, the number of posts has been used as a proxy metric for the interest in a topic (Chen et al, 2010 ); yet, while this number may reflect production patterns, it may not reflect how much content on the topic users read (as seen in section 5.1). In the context of detecting hate speech online, Olteanu et al ( 2017a ) found that even when a given performance metric is fixed (e.g., precision), user perceptions of the output quality may vary based on various user characteristics. Finally, in some cases, metrics may themselves be designed using a statistical model, subject to the same biases presented in section 7.3 (Diaz, 2016 ).…”
Section: Issues With the Evaluation And Interpretation Of Findingsmentioning
confidence: 99%
See 1 more Smart Citation
“…Also, some metric choices do not reflect the true performance of the proposed methods. (Olteanu et al, 2017) argues for evaluation metrics that are directly proportional to user perception of correctness, thus more humancentered.…”
Section: Varying Preprocessing Steps Andmentioning
confidence: 99%
“…With advances in computing, networking, and sensing technologies, cyber-physical systems have been deployed in various safety-critical settings such as aerospace, energy, transportation, and healthcare. The increasing complexity and connectivity of these systems, the tight coupling between their cyber and physical components, and the inevitable involvement of human operators in their supervision and control has introduced significant challenges in ensuring system reliability 11 and safety while maintaining the expected performance. Cyber-physical systems continuously interact with the physical world and human operators in real-time.…”
Section: A Cyber-physical Systemsmentioning
confidence: 99%