Findings of the Association for Computational Linguistics: ACL 2022 2022
DOI: 10.18653/v1/2022.findings-acl.133
|View full text |Cite
|
Sign up to set email alerts
|

Calibration of Machine Reading Systems at Scale

Abstract: In typical machine learning systems, an estimate of the probability of the prediction is used to assess the system's confidence in the prediction. This confidence measure is usually uncalibrated; i.e. the system's confidence in the prediction does not match the true probability of the predicted output. In this paper, we present an investigation into calibrating open setting machine reading systems such as open-domain question answering and claim verification systems. We show that calibrating such complex syste… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 27 publications
0
1
0
Order By: Relevance
“…While selective question answering offers a way of measuring calibration, the scale of confidence values are not considered since abstention can be effective as long as correct predictions have higher confidence than wrong ones, regardless of the absolute scales. A recent work (Dhuliawala et al, 2022) also explored calibration for retriever-reader style ODQA setup, mainly focusing on how to combine information from the retriever and reader components. While in this paper we only focused on span-extraction QA, Jiang et al (2021) focused on generation-style QA.…”
Section: Related Workmentioning
confidence: 99%
“…While selective question answering offers a way of measuring calibration, the scale of confidence values are not considered since abstention can be effective as long as correct predictions have higher confidence than wrong ones, regardless of the absolute scales. A recent work (Dhuliawala et al, 2022) also explored calibration for retriever-reader style ODQA setup, mainly focusing on how to combine information from the retriever and reader components. While in this paper we only focused on span-extraction QA, Jiang et al (2021) focused on generation-style QA.…”
Section: Related Workmentioning
confidence: 99%