Extreme Regression for Dynamic Search Advertising

Prabhu, Yashoteja; Kusupati, Aditya; Gupta, Nilesh; Varma, Manik

doi:10.1145/3336191.3371768

Cited by 17 publications

(15 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In meta-learning domains, the characteristics of each problem instance are considered, and the output is an ordered list of algorithms according to their suitability to the given problem [6], [7]. Lastly, In text classification, a label ranking algorithm can be employed to output a ranked list of topics, tags or advertisements for a document or web page (the instance) [8], [9]. Due to this wide applicability, label ranking has recently attracted a lot of focus from the machine learning community [10]- [20].…”

Section: Introductionmentioning

confidence: 99%

BoostLR: A Boosting-Based Learning Ensemble for Label Ranking Tasks

Dery

Shmueli

2020

IEEE Access

View full text Add to dashboard Cite

Label ranking tasks are concerned with the problem of ranking a finite set of labels for each instance according to their relevance. Boosting is a well-known and reliable ensemble technique that was shown to often outperform other learning algorithms. While boosting algorithms were developed for a multitude of machine learning tasks, label ranking tasks were overlooked. Herein, we present a novel boosting algorithm, BoostLR, that was specifically designed for label ranking tasks. Similarly to other boosting algorithms, BoostLR, proceeds in rounds, where in each round, a single weak model is trained over a sampled set of instances. Instances that were identified as harder to predict in the current round, receive a higher (boosted) weight, and therefore also a higher probability to be included in the sample of the forthcoming round. Extensive evaluation of our proposed algorithm on 24 semi-synthetic and real-world label ranking datasets concludes that our algorithm significantly outperforms the current state-of-the-art label ranking methods.

show abstract

Section: Introductionmentioning

confidence: 99%

BoostLR: A Boosting-Based Learning Ensemble for Label Ranking Tasks

Dery

Shmueli

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…(1) State of the art extreme classifiers such as AttentionXML [66], Astec [11], DiSMEC [2], Parabel [45] and Bonsai [26] (2) Extreme classifiers which improve performance on few-shot labels such as DECAF [40], XReg [46] and PFastreXML [20] (3) Dense retrieval methods based on the state of the art natural language modelling architectures such as Sentence BERT bi-encoder [48], Fasttext [24] and WarpLDA (topic model) [10], these algorithms provide strong scalable baseline to compare ZestXML's performance over zero-shot and few-shot labels (4) Leading zero-shot multi-label learners such as 0-BIGRU-WLAN, 0-CNN-LWAN [50] and CoNSE [43], these baselines don't scale on extreme datasets, hence, ZestXML's comparison against these baselines is reported only for EURLex-4.3K in Table ??. The implementation of all the aforementioned algorithms were provided by their authors.…”

Section: Experiments 51 Experiments Settingsmentioning

confidence: 99%

“…Additionally, they also tend to perform poorly on few-shot labels due to classifier over-fitting issues (see Section 5). Recently, several approaches have been proposed which aim to model the few-shot labels more accurately [40,46]; these, however, do not address the zero-shot prediction problem.…”

Section: Introductionmentioning

confidence: 99%

“…AttentionXML achieves state-of-the-art prediction accuracies through BiLSTM-based feature learning but it is slow to train even with high-end GPUs and does not scale to more than a few million labels. Extreme classifiers like PfastreXML [20], XReg [46] and DECAF [40] optimize for few-shot labels where the first two learn for propensity-scored losses which prioritize tail labels and the last incorporates label features for better tail label modeling. Regardless of their nature, none of these perform zeroshot label prediction.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Generalized Zero-Shot Extreme Multi-label Learning

Gupta

Bohra

Prabhu

et al. 2021

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining

Self Cite

View full text Add to dashboard Cite

Extreme Multi-label Learning (XML) involves assigning the subset of most relevant labels to a data point from millions of label choices. A hitherto unaddressed challenge in XML is that of predicting unseen labels with no training points. These form a significant fraction of total labels and contain fresh and personalized information desired by end users. Most existing extreme classifiers are not equipped for zero-shot label prediction and hence fail to leverage unseen labels. As a remedy, this paper proposes a novel approach called ZestXML for the task of Generalized Zero-shot XML (GZXML) where relevant labels have to be chosen from all available seen and unseen labels. ZestXML learns to project a data point's features close to the features of its relevant labels through a highly sparsified linear transform. This 0-constrained linear map between the two highdimensional feature vectors is tractably recovered through a novel optimizer based on Hard Thresholding. By effectively leveraging the sparsities in features, labels and the learnt model, ZestXML achieves higher accuracy and smaller model size than existing XML approaches while also promoting efficient training & prediction, real-time label update as well as explainable prediction.Experiments on large-scale GZXML datasets demonstrated that ZestXML can be up to 14% and 10% more accurate than state-ofthe-art extreme classifiers and leading BERT-based dense retrievers respectively, while having 10x smaller model size. ZestXML trains on largest dataset with 31M labels in just 30 hours on a single core of a commodity desktop. When added to an large ensemble of existing models in Bing Sponsored Search Advertising, ZestXML significantly improved click yield of IR based system by 17% and unseen query coverage by 3.4% respectively. ZestXML's source code and benchmark datasets for GZXML will be publically released for research purposes here.

show abstract

“…For instance, the deep learning XMC model X-Transformer [7,52] achieves state-of-the-art performance on public academic benchmarks [3]. Partition-based methods such as Parabel [33] and XReg [34], as another example, finds successful applications to dynamic search advertising in Bing. In particular, tree-based partitioning XMC models are a staple of modern search engines and recommender systems due to their inference time being sub-linear (i.e., logarithmic) to the enormous output space (e.g., 100 million or more).…”

Section: Introductionmentioning

confidence: 99%

Extreme Multi-label Learning for Semantic Matching in Product Search

Chang

Jiang

et al. 2021

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

We consider the problem of semantic matching in product search: given a customer query, retrieve all semantically related products from a huge catalog of size 100 million, or more. Because of large catalog spaces and real-time latency constraints, semantic matching algorithms not only desire high recall but also need to have low latency. Conventional lexical matching approaches (e.g., Okapi-BM25) exploit inverted indices to achieve fast inference time, but fail to capture behavioral signals between queries and products. In contrast, embedding-based models learn semantic representations from customer behavior data, but the performance is often limited by shallow neural encoders due to latency constraints. Semantic product search can be viewed as an eXtreme Multi-label Classification (XMC) problem, where customer queries are input instances and products are output labels. In this paper, we aim to improve semantic product search by using tree-based XMC models where inference time complexity is logarithmic in the number of products. We consider hierarchical linear models with n-gram features for fast real-time inference. Quantitatively, our method maintains a low latency of 1.25 milliseconds per query and achieves a 65% improvement of Recall@100 (60.9% v.s. 36.8%) over a competing embedding-based DSSM model. Our model is robust to weight pruning with varying thresholds, which can flexibly meet different system requirements for online deployments. Qualitatively, our method can retrieve products that are complementary to existing product search system and add diversity to the match set.

show abstract

Extreme Regression for Dynamic Search Advertising

Cited by 17 publications

References 38 publications

BoostLR: A Boosting-Based Learning Ensemble for Label Ranking Tasks

BoostLR: A Boosting-Based Learning Ensemble for Label Ranking Tasks

Generalized Zero-Shot Extreme Multi-label Learning

Extreme Multi-label Learning for Semantic Matching in Product Search

Contact Info

Product

Resources

About