Calibration of Machine Reading Systems at Scale

Dhuliawala, Shehzaad; Adolphs, Leonard; Das, Raj; Sachan, Mrinmaya

doi:10.48550/arxiv.2203.10623

Cited by 1 publication

(1 citation statement)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A common method AI systems use to convey their uncertainty to the user is by its confidence (Benz and Rodriguez, 2023;Liu et al, 2023). For the system's confidence to reflect the probability of the system being correct, the confidence needs to be calibrated, which is a long-standing task (Guo et al, 2017;Dhuliawala et al, 2022). This can be any metric, such as quality estimation (Specia et al, 2010;Zouhar et al, 2021) that makes it easier for the user to decide on the AI system's correctness.…”

Section: Related Workmentioning

confidence: 99%

A Diachronic Perspective on User Trust in AI under Uncertainty

Dhuliawala,

Zouhar,

El-Assady

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

In a human-AI collaboration, users build a mental model of the AI system based on its reliability and how it presents its decision, e.g. its presentation of system confidence and an explanation of the output. Modern NLP systems are often uncalibrated, resulting in confidently incorrect predictions that undermine user trust. In order to build trustworthy AI, we must understand how user trust is developed and how it can be regained after potential trust-eroding events. We study the evolution of user trust in response to these trust-eroding events using a betting game. We find that even a few incorrect instances with inaccurate confidence estimates damage user trust and performance, with very slow recovery. We also show that this degradation in trust reduces the success of human-AI collaboration and that different types of miscalibration-unconfidently correct and confidently incorrect-have different negative effects on user trust. Our findings highlight the importance of calibration in user-facing AI applications and shed light on what aspects help users decide whether to trust the AI system.

show abstract

Section: Related Workmentioning

confidence: 99%