Pre-trained Language Model based Ranking in Baidu Search

Zou, Lixin; Zhang, Shengqiang; Cai, Hengyi; Ma, Dehong; Cheng, Suqi; Wang, Shuaiqiang; Shi, Daiting; Cheng, Zhicong; Yin, Dawei

doi:10.1145/3447548.3467147

Cited by 46 publications

(26 citation statements)

References 44 publications

(48 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that a passage p consists of 𝑇 sentences p = {s 𝜏 } 𝑇 𝜏=1 . Following a previous study [52], a desirable re-ranker is a scoring function 𝑓 * (•, •) that maximizes the consistency between its predictions (denoted as Ŷq,P = {𝑓 (q, p 𝜅 ) | p 𝜅 ∈ P}) and the ground truth labels (denoted as 𝑌 = {𝑦 𝜅 } 𝜘 𝜅=1 ), i.e.,…”

Section: Problem Formulation 31 Passage Re-rankingmentioning

confidence: 99%

See 1 more Smart Citation

Incorporating Explicit Knowledge in Pre-trained Language Models for Passage Re-ranking

Dong¹,

Liu²,

Cheng³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Passage re-ranking is to obtain a permutation over the candidate passage set from retrieval stage. Re-rankers have been boomed by Pre-trained Language Models (PLMs) due to their overwhelming advantages in natural language understanding. However, existing PLM based re-rankers may easily suffer from vocabulary mismatch and lack of domain specific knowledge. To alleviate these problems, explicit knowledge contained in knowledge graph is carefully introduced in our work. Specifically, we employ the existing knowledge graph which is incomplete and noisy, and first apply it in passage re-ranking task. To leverage a reliable knowledge, we propose a novel knowledge graph distillation method and obtain a knowledge meta graph as the bridge between query and passage. To align both kinds of embedding in the latent space, we employ PLM as text encoder and graph neural network over knowledge meta graph as knowledge encoder. Besides, a novel knowledge injector is designed for the dynamic interaction between text and knowledge encoder. Experimental results demonstrate the effectiveness of our method especially in queries requiring in-depth domain knowledge.

show abstract

Section: Problem Formulation 31 Passage Re-rankingmentioning

confidence: 99%

“…To this end, we mimic human judgment and only focus on the sentence of each passage that is the most related to a query [52].…”

Section: Algorithm 1: Meta-graph Construction Algorithmmentioning

confidence: 99%

Incorporating Explicit Knowledge in Pre-trained Language Models for Passage Re-ranking

Dong¹,

Liu²,

Cheng³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this section, we introduce recent works designing PTMs tailored for IR (Lee et al, 2019b;Chang et al, 2019;Ma et al, 2021b;Ma et al, 2021c;Boualili et al, 2020;Ma et al, 2021d;Zou et al, 2021;Liu et al, 2021d). General pre-trained models like BERT have achieved great success when applied to IR tasks on both the firststage retrieval and the re-ranking stage.…”

Section: Keyphrase Extractionmentioning

confidence: 99%

Pre-training Methods in Information Retrieval

Fan¹,

Xie²,

Cai³

et al. 2021

Preprint

View full text Add to dashboard Cite

The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to user's information need. Recently, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Since there have been a large number of works dedicating to the

show abstract

“…Despite the above-mentioned bi-encoder models, interactionbased methods are also widely used in many information retrieval systems [9,10,44,[48][49][50][51][52]. As such, another line of research for semantic matching is to model query-document interaction with DNNs [25,28,40,44,53]. However, they cannot cache the document embeddings offline, and thus are inefficient for retrieval.…”

Section: Related Work 21 Semantic Retrieval In Web Searchmentioning

confidence: 99%

“…They are preferred for ranking stage, which will not be further discussed in this paper. In our search engine, interaction-based methods are exploited to build the PLM-based ranking system [53].…”

Section: Related Work 21 Semantic Retrieval In Web Searchmentioning

confidence: 99%

Pre-trained Language Model for Web-scale Retrieval in Baidu Search

Liu¹,

Huang²,

Liu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Retrieval is a crucial stage in web search that identifies a small set of query-relevant candidates from a billion-scale corpus. Discovering more semantically-related candidates in the retrieval stage is very promising to expose more high-quality results to the end users. However, it still remains non-trivial challenges of building and deploying effective retrieval models for semantic matching in real search engine. In this paper, we describe the retrieval system that we developed and deployed in Baidu Search. The system exploits the recent state-of-the-art Chinese pretrained language model, namely Enhanced Representation through kNowledge IntEgration (ERNIE), which facilitates the system with expressive semantic matching. In particular, we developed an ERNIE-based retrieval model, which is equipped with 1) expressive Transformer-based semantic encoders, and 2) a comprehensive multi-stage training paradigm. More importantly, we present a practical system workflow for deploying the model in web-scale retrieval. Eventually, the system is fully deployed into production, where rigorous offline and online experiments were conducted. The results show that the system can perform high-quality candidate retrieval, especially for those tail queries with uncommon demands. Overall, the new retrieval system facilitated by pretrained language model (i.e., ERNIE) can largely improve the usability and applicability of our search engine.

show abstract

Pre-trained Language Model based Ranking in Baidu Search

Cited by 46 publications

References 44 publications

Incorporating Explicit Knowledge in Pre-trained Language Models for Passage Re-ranking

Incorporating Explicit Knowledge in Pre-trained Language Models for Passage Re-ranking

Pre-training Methods in Information Retrieval

Pre-trained Language Model for Web-scale Retrieval in Baidu Search

Contact Info

Product

Resources

About