The World Wide Web Conference 2019
DOI: 10.1145/3308558.3313466
|View full text |Cite
|
Sign up to set email alerts
|

Learning Fast Matching Models from Weak Annotations

Abstract: We propose a novel training scheme for fast matching models in Search Ads, motivated by practical challenges. The first challenge stems from the pursuit of high throughput, which prohibits the deployment of inseparable architectures, and hence greatly limits model accuracy. The second problem arises from the heavy dependency on human provided labels, which are expensive and time-consuming to collect, yet how to leverage unlabeled search log data is rarely studied. The proposed training framework targets on mit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 27 publications
0
3
0
Order By: Relevance
“…Parameters in baselines are carefully tuned on the validation set to select the most desirable parameter setting. Considering the high imbalance distribution of the annotations, following the previous work [20] we select ROC-AUC score as the measurement, which represents the area under the Receiver Operating Characteristic curve. We release our code to facilitate future research (https:// github.com/ qwe35/ AdsGNN ).…”
Section: Baseline Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Parameters in baselines are carefully tuned on the validation set to select the most desirable parameter setting. Considering the high imbalance distribution of the annotations, following the previous work [20] we select ROC-AUC score as the measurement, which represents the area under the Receiver Operating Characteristic curve. We release our code to facilitate future research (https:// github.com/ qwe35/ AdsGNN ).…”
Section: Baseline Methodsmentioning
confidence: 99%
“…This strategy confuses the relevance correlations with the click relations and thus may introduce ambiguities from two aspects. Firstly, the arbitrariness and subjectivity of user behavior lead to the misalignment between user clicks and true relevance annotations [20], which may introduce noises into the ground truth and further pollute the training set. Secondly, negative pairs sampled by data synthesizing usually share no common tokens for queries and ads, which may mislead the relevance model to view common terms as critical evidence of relevance.…”
Section: Introductionmentioning
confidence: 99%
“…Some other works [3,12] learn embeddings of queries and ads in a shared vector space from search session, ad click, and search link click data using word2vec [28] like algorithms. A few recent works [23] exploit advances in neural information retrieval models [29] such as Deep Crossing [41] in sponsored search. An important line of work [11,13,15,27,37] that is based on the idea of performing query to query transformations, also known as query rewriting.…”
Section: Introductionmentioning
confidence: 99%