Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.316
|View full text |Cite
|
Sign up to set email alerts
|

Generation-Augmented Retrieval for Open-Domain Question Answering

Abstract: We propose Generation-Augmented Retrieval (GAR) for answering open-domain questions, which augments a query through text generation of heuristically discovered relevant contexts without external resources as supervision. We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR (Karpukhin et al., 2020). We show that generating diverse … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
58
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 60 publications
(58 citation statements)
references
References 38 publications
0
58
0
Order By: Relevance
“…DeepCT (Dai & Callan, 2019) uses BERT to dynamically generate lexical weights to augment BM25 Systems. doc2Query (Nogueira et al, 2019b), docTTTTTQuery (Nogueira et al, 2019a), and GAR (Mao et al, 2021a) use text generation to expand queries or documents to make better use of BM25. The middle block lists the results of strong dense retrieval methods, including DPR (Karpukhin et al, 2020), ANCE (Xiong et al, 2021), RDR (Yang & Seo, 2020), RocketQA (Qu et al, 2021), Joint andIndividual Top-k (Sachan et al, 2021b), PAIR (Ren et al, 2021), DPR-PAQ (Oguz et al, 2021), Condenser (Gao & Callan, 2021b).…”
Section: Resultsmentioning
confidence: 99%
“…DeepCT (Dai & Callan, 2019) uses BERT to dynamically generate lexical weights to augment BM25 Systems. doc2Query (Nogueira et al, 2019b), docTTTTTQuery (Nogueira et al, 2019a), and GAR (Mao et al, 2021a) use text generation to expand queries or documents to make better use of BM25. The middle block lists the results of strong dense retrieval methods, including DPR (Karpukhin et al, 2020), ANCE (Xiong et al, 2021), RDR (Yang & Seo, 2020), RocketQA (Qu et al, 2021), Joint andIndividual Top-k (Sachan et al, 2021b), PAIR (Ren et al, 2021), DPR-PAQ (Oguz et al, 2021), Condenser (Gao & Callan, 2021b).…”
Section: Resultsmentioning
confidence: 99%
“…To remedy the vocabulary gap between queries and documents, Nogueira and Lin [29,28] employed seq2seq model transformer [39] and later T5 [33] to generate document expansions, which brings significant gains for BM25. In the same vein, Mao et al [27] adopted seq2seq model BART [20] to generate query expansions, which outperforms RM3 [15], a highly performant lexical query expansion method.…”
Section: Related Workmentioning
confidence: 99%
“…In another line of work, Mao et al (2021) seek to generate clarification texts for input questions to improve the retrieval quality in open-domain QA (answering factoid questions without a prespecified domain). The most common approach for this problem involves a retriever-reader architecture (Chen et al, 2017), which first retrieves a small subset of documents in the pool using the input question as the query and then analyzes the retrieved documents to extract (or generate) an answer.…”
Section: Question Generationmentioning
confidence: 99%
“…The most common approach for this problem involves a retriever-reader architecture (Chen et al, 2017), which first retrieves a small subset of documents in the pool using the input question as the query and then analyzes the retrieved documents to extract (or generate) an answer. To generate augmented texts for the input question in the first retrieval component, Mao et al (2021) fine-tune BART to consume the input question and attempt to produce the answer and the sentence or title of the paragraph containing the answer. This method demonstrates superior performance for both retrieval and end-to-end QA performance.…”
Section: Question Generationmentioning
confidence: 99%