Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.95
|View full text |Cite
|
Sign up to set email alerts
|

Do Explanations Help Users Detect Errors in Open-Domain QA? An Evaluation of Spoken vs. Visual Explanations

Abstract: While research on explaining predictions of open-domain QA systems (ODQA) is gaining momentum, most works do not evaluate whether these explanations improve user trust. Furthermore, many users interact with ODQA using voice-assistants, yet prior works exclusively focus on visual displays, risking (as we also show) incorrectly extrapolating the effectiveness of explanations across modalities. To better understand the effectiveness of ODQA explanations strategies in the wild, we conduct user studies that measure… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…On the other hand, multiple studies do find evidence of synergistic human-computer systems. For instance, in the study with the highest ρ ratio in our study, [41] demonstrate how algorithms can improve human decision-making in open-domain question answering tasks. In their experiment, the condition in which humans work alone achieves an accuracy of 57% and the condition in which the algorithm works alone achieves an accuracy of 50%.…”
Section: Study 1: Analysis Of Recent Studies That Evaluate Human-comp...mentioning
confidence: 68%
“…On the other hand, multiple studies do find evidence of synergistic human-computer systems. For instance, in the study with the highest ρ ratio in our study, [41] demonstrate how algorithms can improve human decision-making in open-domain question answering tasks. In their experiment, the condition in which humans work alone achieves an accuracy of 57% and the condition in which the algorithm works alone achieves an accuracy of 50%.…”
Section: Study 1: Analysis Of Recent Studies That Evaluate Human-comp...mentioning
confidence: 68%
“…Specifically, we need to know what users do with the model output across multiple interactions (e.g., verify, fact check, revise, accept). For example, González et al (2021) investigate the connection between explanations (D2) and user trust in the context of question answering systems. In their study users are presented with explanations in different modalities and either accept (trust) or reject (don't trust) candidate answers.…”
Section: Trustworthiness and User Trustmentioning
confidence: 99%
“…Other work that addresses human-in-the-loop evaluation of interpretability for deep neural models (a) includes Gonzalez and Søgaard (2020) and González et al (2021), but both evaluate interpretability methods with lay people and on non-critical tasks, ignoring (b) and (c). Attempts to evaluate interpretability methods for experts performing critical tasks, have, to the best of our knowledge, been limited to automatic evaluation or evaluation against gold-standard human rationales.…”
Section: Related Workmentioning
confidence: 99%