Multiple outbreaks of dengue serotype 2 in north Queensland, 2003/04

Hanna, Jeffrey N; Ritchie, Scott A.; Richards, Ann R; Taylor, Carmel; Pyke, Alyssa T.; Montgomery, Brian L.; Piispanen, John P; Morgan, Anna K; Humphreys, Jan L

doi:10.1111/j.1467-842x.2006.tb00861.x

Xiongfeng Zheng

1Publication

0Citation Statements Received

40Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking

Qin¹,

Liu²,

Zheng³

et al. 2023

Preprint

View full text Add to dashboard Cite

Although BERT-based ranking models have been commonly used in commercial search engines, they are usually time-consuming for online ranking tasks. Knowledge distillation, which aims at learning a smaller model with comparable performance to a larger model, is a common strategy for reducing the online inference latency. In this paper, we investigate the effect of different loss functions for uniform-architecture distillation of BERT-based ranking models. Here "uniform-architecture" denotes that both teacher and student models are in cross-encoder architecture, while the student models include small-scaled pre-trained language models. Our experimental results reveal that the optimal distillation configuration for ranking tasks is much different than general natural language processing tasks. Specifically, when the student models are in cross-encoder architecture, a pairwise loss of hard labels is critical for training student models, whereas the distillation objectives of intermediate Transformer layers may hurt performance. These findings emphasize the necessity of carefully designing a distillation strategy (for cross-encoder student models) tailored for document ranking with pairwise training samples. INTRODUCTIONRecent years have witnessed great progress of applying deep learning methods to information retrieval tasks [19]. In particular, on document ranking, pre-trained language models (PLM), such as BERT [4], have achieved state-of-the art performance. However, because these pre-trained models often have a large number of parameters, they incur an inevitable computational cost and latency during the inference stage [6]. This problem will be even severe when deploying pre-trained models in latency-sensitive online ranking tasks. To tackle this problem, numerous PLM-based knowledge distillation (KD) methods [14] have been widely studied. The principle of knowledge distillation can be summarized as

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiongfeng Zheng

An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking

Contact Info

Product

Resources

About