2021
DOI: 10.48550/arxiv.2111.13853
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pre-training Methods in Information Retrieval

Abstract: The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to user's information need. Recently, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
2

Relationship

1
9

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 200 publications
(305 reference statements)
0
11
0
Order By: Relevance
“…Thus, based on the text representation type and corpus index mode, passage retrieval models can be roughly categorized into two main classes. Sparse retrieval Models: improving retrieval by obtaining semantic-captured sparse representations and index them with the inverted index for efficient retrieval; Dense Retrieval Models: converting query and passage into continuous embedding representations and turn to approximate nearest neighbor (ANN) algorithms for fast retrieval [13].…”
Section: Related Workmentioning
confidence: 99%
“…Thus, based on the text representation type and corpus index mode, passage retrieval models can be roughly categorized into two main classes. Sparse retrieval Models: improving retrieval by obtaining semantic-captured sparse representations and index them with the inverted index for efficient retrieval; Dense Retrieval Models: converting query and passage into continuous embedding representations and turn to approximate nearest neighbor (ANN) algorithms for fast retrieval [13].…”
Section: Related Workmentioning
confidence: 99%
“…Continuing pretraining the off-the-shelf language model has been investigated in mono-lingual retrival [5,13,16]. Specifically, coCondenser [16] continued pretraining of the language model with a passage-containing classification task (i.e., determining if a pair of passages belong to the same document) through contrastive learning on the representation of the passages for monolingual IR before fine-tuning it as a DPR model.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Dense retrieval is receiving increasing interest in recent years from both industrial and academic communities due to its benefits to many IR related tasks, e.g., Web search [9,17,26], question answering [20,23,43] and conversational systems [10,39]. Without loss of generality, dense retrieval usually utilizes a Siamese or bi-encoder architecture to encode queries and documents into low-dimensional representations to abstract their semantic information [18,19,21,38,40,41].…”
Section: Introductionmentioning
confidence: 99%