Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.497
|View full text |Cite
|
Sign up to set email alerts
|

Mitigating False-Negative Contexts in Multi-document Question Answering with Retrieval Marginalization

Abstract: Question Answering (QA) tasks requiring information from multiple documents often rely on a retrieval model to identify relevant information for reasoning. The retrieval model is typically trained to maximize the likelihood of the labeled supporting evidence. However, when retrieving from large text corpora such as Wikipedia, the correct answer can often be obtained from multiple evidence candidates. Moreover, not all such candidates are labeled as positive during annotation, rendering the training signal weak… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…In retrieval-based tasks, the false-negative passage problem refers to passage labels not being fully annotated. Ni et al (2021) sage problem happens very frequently in multidocument question tasks, with such cases appearing in more than half of the sampled answerable questions of the IIRC dataset (Ferguson et al, 2020). Research has been made to mitigate this problem by marginalizing the retrieval process and directly training the retriever with the final goal, e.g., answer label in multi-document QA (Ni et al, 2021).…”
Section: Retrieval Marginalizationmentioning
confidence: 99%
See 1 more Smart Citation
“…In retrieval-based tasks, the false-negative passage problem refers to passage labels not being fully annotated. Ni et al (2021) sage problem happens very frequently in multidocument question tasks, with such cases appearing in more than half of the sampled answerable questions of the IIRC dataset (Ferguson et al, 2020). Research has been made to mitigate this problem by marginalizing the retrieval process and directly training the retriever with the final goal, e.g., answer label in multi-document QA (Ni et al, 2021).…”
Section: Retrieval Marginalizationmentioning
confidence: 99%
“…Ni et al (2021) sage problem happens very frequently in multidocument question tasks, with such cases appearing in more than half of the sampled answerable questions of the IIRC dataset (Ferguson et al, 2020). Research has been made to mitigate this problem by marginalizing the retrieval process and directly training the retriever with the final goal, e.g., answer label in multi-document QA (Ni et al, 2021). However, in knowledge-intensive generation tasks, the marginalization methods do not work so well.…”
Section: Retrieval Marginalizationmentioning
confidence: 99%
“…The reported numbers are on their dev sets. 6 For IIRC, we consider two settings: gold-setting (IIRC-G) which uses only gold supporting sentences as reading comprehension context, and retrieved-setting (IIRC-R) which retrieves paragraphs using a retrieval marginalization method (Ni et al, 2021). We evaluate robustness using DROP contrast set and DROP BPB contrast set (Geva et al, 2022) 7 .…”
Section: Datasetsmentioning
confidence: 99%
“…Recent studies support this assumption. For instance, Ni, Gardner, and Dasigi (2021) found that over half of 50 answerable questions from the IIRC dataset (Ferguson et al 2020) had at least one missing piece of evidence. Similarly, Qu et al (2021) manually reviewed top-retrieved passages not labeled as positives in MSMARCO (Nguyen et al 2016) and detected a 70% false negative rate.…”
Section: Introductionmentioning
confidence: 99%
“…Qu et al (2021) use a highly effective but inefficient reranker based on a cross-encoder to identify high-confidence negatives as true negatives, which were then used to train the retrieval model. Ni, Gardner, and Dasigi (2021) leverage answers in the downstream QA task and design several heuristics to detect valid contexts such as lexical overlapping (between a gold answer and a candidate passage). Recently, Zhou et al (2022) suggests selecting samples that are highly similar to positive samples but not too close to the query.…”
Section: Introductionmentioning
confidence: 99%