Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2022
DOI: 10.18653/v1/2022.naacl-main.226
|View full text |Cite
|
Sign up to set email alerts
|

Boosted Dense Retriever

Abstract: We propose DrBoost, a dense retrieval ensemble inspired by boosting. DrBoost is trained in stages: each component model is learned sequentially and specialized by focusing only on retrieval mistakes made by the current ensemble. The final representation is the concatenation of the output vectors of all the component models, making it a drop-in replacement for standard dense retrievers at test time. Dr-Boost enjoys several advantages compared to standard dense retrieval models. It produces representations which… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 34 publications
0
0
0
Order By: Relevance
“…Currently, each component model in our architecture naively samples a subset of negative edges, whereas DrBoost could intelligently adjust the sampling distributions, potentially leading to improved results. 8 Finally, we plan to further enhance the model by incorporating user feedback through an application hosting this model. User interactions provide multiple avenues for model improvement.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Currently, each component model in our architecture naively samples a subset of negative edges, whereas DrBoost could intelligently adjust the sampling distributions, potentially leading to improved results. 8 Finally, we plan to further enhance the model by incorporating user feedback through an application hosting this model. User interactions provide multiple avenues for model improvement.…”
Section: Resultsmentioning
confidence: 99%
“…Recent works have been designed to tackle the Question-answering task and require specific q uestions, a nswers, a nd p assages t o b e organized within the dataset. 8,14 While our document retrieval task is related, the concern of sensitive organizational data prevents us from using automatic labeling techniques and in general, leveraging the learned priors from internetscale models. 15,16 The alternative is a labor-intensive data annotation effort e ngaging m any s ubject matter experts.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation