2021
DOI: 10.48550/arxiv.2104.08731
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Can NLI Models Verify QA Systems' Predictions?

Abstract: To build robust question answering systems, we need the ability to verify whether answers to questions are truly correct, not just "good enough" in the context of imperfect QA datasets. We explore the use of natural language inference (NLI) as a way to achieve this goal, as NLI inherently requires the premise (document context) to contain all necessary information to support the hypothesis (proposed answer to the question). We leverage large pretrained models and recent prior datasets to construct powerful que… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
0
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 52 publications
(81 reference statements)
0
0
0
Order By: Relevance
“…Fact Duration Following suit with the QA evaluations above, we also evaluate fact duration prediction on SituatedQA. To generate fact-duration pairs, we use the annotated previous answer as of 2021, converting the question/answer pair into statement using an existing T5-based conversion model (Chen et al, 2021a). We then use distance between the 2021 and previous answer's start date as the fact's duration, d.…”
Section: Evaluation Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…Fact Duration Following suit with the QA evaluations above, we also evaluate fact duration prediction on SituatedQA. To generate fact-duration pairs, we use the annotated previous answer as of 2021, converting the question/answer pair into statement using an existing T5-based conversion model (Chen et al, 2021a). We then use distance between the 2021 and previous answer's start date as the fact's duration, d.…”
Section: Evaluation Datasetsmentioning
confidence: 99%
“…TimeQA (Chen et al, 2021b) is one such work that curates a dataset of 70 different temporally-dependent relations from Wikidata and uses handcrafted templates to convert into decontextualized QA pairs, where the question specifies a time period. To convert this dataset into factduration pairs (f, d), we first convert their QA pairs into a factual statements by removing the date and using a QA-to-statement conversion model (Chen et al, 2021a). We then determine the duration of each facts to be the length of time between the start date of one answer to the question and the next.…”
Section: Distant Supervision Sourcesmentioning
confidence: 99%
“…Answer Correctness (AC) QA models often lack the ability to verify the correctness of the predicted answer (Chen et al, 2021). One way to address this issue is to reformulate it to a textual entailment problem (Harabagiu and Hickl, 2006;Richardson et al, 2013;Chen et al, 2021) by viewing the answer context as the premise and the QA pair as the hypothesis. Then we use a natural language inference (NLI) system to verify whether the candidate answer proposed by crowd workers satisfies the entailment criterion.…”
Section: Answermentioning
confidence: 99%