2022
DOI: 10.48550/arxiv.2212.08841
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AugTriever: Unsupervised Dense Retrieval by Scalable Data Augmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…The latter generates quality but more expensive human-like queries using large language models for DR pre-training (Oguz et al, 2022) or domain adaptation (the third section of Table 1; . Concurrently to our work, Meng et al (2023) explore various approaches to query augmentation, such as span selection and document summarization.…”
Section: A Unified Framework Of Improved Densementioning
confidence: 99%
“…The latter generates quality but more expensive human-like queries using large language models for DR pre-training (Oguz et al, 2022) or domain adaptation (the third section of Table 1; . Concurrently to our work, Meng et al (2023) explore various approaches to query augmentation, such as span selection and document summarization.…”
Section: A Unified Framework Of Improved Densementioning
confidence: 99%
“…This has included InPars (Bonifacio et al, 2022;Jeronymo et al, 2023) and Promptagator (Dai et al, 2022), the latter showcasing significant success on the BEIR benchmark. Augtriever (Meng et al, 2022) introduced methods for synthetic query generation using smaller models, optimizing both time and cost. Peng et al (2023) used soft prompt-tuning to further enhance the quality of generated queries.…”
Section: Related Workmentioning
confidence: 99%
“…Recent research exploits Large Language Models (LLMs) to generate synthetic data pairs, constructing synthetic queries from real passages, often derived from zero-shot or few-shot examples (Bonifacio et al, 2022;Jeronymo et al, 2023;Meng et al, 2022;Penha et al, 2023). Addressing the challenges of complex query information retrieval (IR) tasks through LLM-based synthetic data generation presents distinct difficulties.…”
Section: Introductionmentioning
confidence: 99%