Bonsai: diverse and shallow trees for extreme multi-label classification

Khandagale, Sujay; Han, Xiao; Babbar, Rohit

doi:10.1007/s10994-020-05888-2

Cited by 91 publications

(88 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When a leaf node is visited, the multi-label classifier of that node decides which labels of the node will be assigned to the document. PARABEL, BONSAI: We experiment with PARA-BEL (Prabhu et al, 2018) and BONSAI (Khandagale et al, 2019), two state-of-the-art PLT-based methods. PARABEL employs binary PLTs (k = 2), while BONSAI uses non-binary PLTs (k > 2), which are shallower and wider.…”

Section: Hierarchical Plt-based Methodsmentioning

confidence: 99%

See 1 more Smart Citation

An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels

Chalkidis¹,

Fergadiotis²,

Kotitsas³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications and presents interesting challenges. First, not all labels are well represented in the training set, due to the very large label set and the skewed label distributions of LMTC datasets. Also, label hierarchies and differences in human labelling guidelines may affect graph-aware annotation proximity. Finally, the label hierarchies are periodically updated, requiring LMTC models capable of zero-shot generalization. Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs), which (1) typically treat LMTC as flat multi-label classification; (2) may use the label hierarchy to improve zero-shot learning, although this practice is vastly understudied; and (3) have not been combined with pre-trained Transformers (e.g. BERT), which have led to state-of-the-art results in several NLP benchmarks. Here, for the first time, we empirically evaluate a battery of LMTC methods from vanilla LWANs to hierarchical classification approaches and transfer learning, on frequent, few, and zero-shot learning on three datasets from different domains. We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs. Furthermore, we show that Transformer-based approaches outperform the state-of-the-art in two of the datasets, and we propose a new state-of-the-art method which combines BERT with LWAN. Finally, we propose new models that leverage the label hierarchy to improve few and zero-shot learning, considering on each dataset a graph-aware annotation proximity measure that we introduce.

show abstract

Section: Hierarchical Plt-based Methodsmentioning

confidence: 99%

“…• We show that hierarchical LMTC approaches based on Probabilistic Label Trees (PLTs) (Prabhu et al, 2018;Khandagale et al, 2019;You et al, 2019) outperform flat neural state-of-the-art methods, i.e., LWAN (Mullenbach et al, 2018) in two out of three datasets (EURLEX57K, AMAZON13K).…”

Section: Introductionmentioning

confidence: 93%

An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels

Chalkidis¹,

Fergadiotis²,

Kotitsas³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…(1) State of the art extreme classifiers such as AttentionXML [66], Astec [11], DiSMEC [2], Parabel [45] and Bonsai [26] (2) Extreme classifiers which improve performance on few-shot labels such as DECAF [40], XReg [46] and PFastreXML [20] (3) Dense retrieval methods based on the state of the art natural language modelling architectures such as Sentence BERT bi-encoder [48], Fasttext [24] and WarpLDA (topic model) [10], these algorithms provide strong scalable baseline to compare ZestXML's performance over zero-shot and few-shot labels (4) Leading zero-shot multi-label learners such as 0-BIGRU-WLAN, 0-CNN-LWAN [50] and CoNSE [43], these baselines don't scale on extreme datasets, hence, ZestXML's comparison against these baselines is reported only for EURLex-4.3K in Table ??. The implementation of all the aforementioned algorithms were provided by their authors.…”

Section: Experiments 51 Experiments Settingsmentioning

confidence: 99%

Generalized Zero-Shot Extreme Multi-label Learning

Gupta

Bohra

Prabhu

et al. 2021

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

Extreme Multi-label Learning (XML) involves assigning the subset of most relevant labels to a data point from millions of label choices. A hitherto unaddressed challenge in XML is that of predicting unseen labels with no training points. These form a significant fraction of total labels and contain fresh and personalized information desired by end users. Most existing extreme classifiers are not equipped for zero-shot label prediction and hence fail to leverage unseen labels. As a remedy, this paper proposes a novel approach called ZestXML for the task of Generalized Zero-shot XML (GZXML) where relevant labels have to be chosen from all available seen and unseen labels. ZestXML learns to project a data point's features close to the features of its relevant labels through a highly sparsified linear transform. This 0-constrained linear map between the two highdimensional feature vectors is tractably recovered through a novel optimizer based on Hard Thresholding. By effectively leveraging the sparsities in features, labels and the learnt model, ZestXML achieves higher accuracy and smaller model size than existing XML approaches while also promoting efficient training & prediction, real-time label update as well as explainable prediction.Experiments on large-scale GZXML datasets demonstrated that ZestXML can be up to 14% and 10% more accurate than state-ofthe-art extreme classifiers and leading BERT-based dense retrievers respectively, while having 10x smaller model size. ZestXML trains on largest dataset with 31M labels in just 30 hours on a single core of a commodity desktop. When added to an large ensemble of existing models in Bing Sponsored Search Advertising, ZestXML significantly improved click yield of IR based system by 17% and unseen query coverage by 3.4% respectively. ZestXML's source code and benchmark datasets for GZXML will be publically released for research purposes here.

show abstract

“…Tree-based methods (Prabhu and Varma 2014;Jain et al 2016;Jasinska et al 2016;Niculescu-Mizil and Abbasnejad 2017;Si et al 2017;Siblini et al 2018;Prabhu et al 2018;Wydmuch et al 2018;Khandagale et al 2020) can be seen as transformation methods that aim to divide the initial large-scale problem into a multiple small-scale sub-problems by recursively partitioning the feature or label space. Those subsets are connected with the nodes of the trees.…”

Section: Related Workmentioning

confidence: 99%

Large scale multi-label learning using Gaussian processes

2021

View full text Add to dashboard Cite

We introduce a Gaussian process latent factor model for multi-label classification that can capture correlations among class labels by using a small set of latent Gaussian process functions. To address computational challenges, when the number of training instances is very large, we introduce several techniques based on variational sparse Gaussian process approximations and stochastic optimization. Specifically, we apply doubly stochastic variational inference that sub-samples data instances and classes which allows us to cope with Big Data. Furthermore, we show it is possible and beneficial to optimize over inducing points, using gradient-based methods, even in very high dimensional input spaces involving up to hundreds of thousands of dimensions. We demonstrate the usefulness of our approach on several real-world large-scale multi-label learning problems.

show abstract

Bonsai: diverse and shallow trees for extreme multi-label classification

Cited by 91 publications

References 28 publications

An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels

An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels

Generalized Zero-Shot Extreme Multi-label Learning

Large scale multi-label learning using Gaussian processes

Contact Info

Product

Resources

About