Accelerating Extreme Classification via Adaptive Feature Agglomeration

Jalan, Ankit; Kar, Purushottam

doi:10.24963/ijcai.2019/361

Cited by 18 publications

(8 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A straightforward idea would be to take an embedding style approach to make the loss function (s, ŝ) = s − ŝ 2 2 . This is intuitive and matches prior approaches to XML [28,30]. However, we found that such an approach resulted in no learning and degenerate random-guessing performance.…”

Section: Dense Label Representationssupporting

confidence: 86%

“…Our use of extreme multi-label classification problems is due to it being out-of-reach of current HRR methods, which we found produced random-guessing performance in all cases. There exists a rich literature of XML methods that tackle the large output space from the perspective of decision trees/ensembles [23][24][25][26][27], label embedding regression [28][29][30][31], naive bayes [32], and linear classifiers [33,34]. There also exist deep learning XML methods that use either a fully-connected output layer [35] and others that use a variety of alternative approaches to dealing with the large output space [36][37][38][39][40][41][42][43].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Learning with Holographic Reduced Representations

Ganesan¹,

Gao²,

Gandhi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Holographic Reduced Representations (HRR) are a method for performing symbolic AI on top of real-valued vectors [1] by associating each vector with an abstract concept, and providing mathematical operations to manipulate vectors as if they were classic symbolic objects. This method has seen little use outside of older symbolic AI work and cognitive science. Our goal is to revisit this approach to understand if it is viable for enabling a hybrid neural-symbolic approach to learning as a differentiable component of a deep learning architecture. HRRs today are not effective in a differentiable solution due to numerical instability, a problem we solve by introducing a projection step that forces the vectors to exist in a well behaved point in space. In doing so we improve the concept retrieval efficacy of HRRs by over 100×. Using multi-label classification we demonstrate how to leverage the symbolic HRR properties to develop an output layer and loss function that is able to learn effectively, and allows us to investigate some of the pros and cons of an HRR neuro-symbolic learning approach.

show abstract

Section: Dense Label Representationssupporting

confidence: 86%

Section: Related Workmentioning

confidence: 99%

Learning with Holographic Reduced Representations

Ganesan¹,

Gao²,

Gandhi³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Therefore, a central challenge in XMC is to build classifiers which retain the accuracy of one-vs-rest paradigm while being as efficiently trainable as the tree-based methods. Recently, there have been efforts for speeding up the training of existing classifiers by better initialization and exploiting the problem structure (Fang et al 2019;Liang et al 2018;Jalan et al 2019). In a similar vein, a recently proposed tree-based method, Parabel (Prabhu et al 2018), partitions the label space recursively into two child nodes using 2-means clustering.…”

Section: Related Workmentioning

confidence: 99%

Bonsai: diverse and shallow trees for extreme multi-label classification

2020

View full text Add to dashboard Cite

Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees. We show three concrete realizations of this label representation space including: (i) the input space which is spanned by the input features, (ii) the output space spanned by label vectors based on their co-occurrence with other labels, and (iii) the joint space by combining the input and output representations. Furthermore, the constraint-free multi-way partitions learnt iteratively in these spaces lead to shallow trees. By combining the effect of shallow trees and generalized label representation, Bonsai achieves the best of both worlds-fast training which is comparable to state-of-the-art tree-based methods in XMC, and much better prediction accuracy, particularly on tail-labels. On a benchmark Amazon-3M dataset with 3 million labels, Bonsai outperforms a state-of-the-art one-vs-rest method in terms of prediction accuracy, while being approximately 200 times faster to train. The code for Bonsai is available at https ://githu b.com/xmc-aalto /bonsa i.

show abstract

“…Apart from the class of methods mentioned above, label-embedding approaches assume that, despite the large number of labels, the label matrix is effectively low rank and therefore project it to a low-dimensional sub-space [19,33,42] . In some of the works, it was argued that the low rank embedding may be insufficient for capturing the label diversity in XMC settings ( [7,36]), which has been questioned in the recent work [16].…”

Section: Application To Other Algorithmsmentioning

confidence: 99%

Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels

Qaraei

Schultheis

Gupta

et al. 2021

Proceedings of the Web Conference 2021

View full text Add to dashboard Cite

Extreme Classification (XC) refers to supervised learning where each training/test instance is labeled with small subset of relevant labels that are chosen from a large set of possible target labels. The framework of XC has been widely employed in web applications such as automatic labeling of web-encyclopedia, prediction of related searches, and recommendation systems.While most state-of-the-art models in XC achieve high overall accuracy by performing well on the frequently occurring labels, they perform poorly on a large number of infrequent (tail) labels. This arises from two statistical challenges, (i) missing labels, as it is virtually impossible to manually assign every relevant label to an instance, and (ii) highly imbalanced data distribution where a large fraction of labels are tail labels. In this work, we consider common loss functions that decompose over labels, and calculate unbiased estimates that compensate missing labels according to Natarajan et al. [26]. This turns out to be disadvantageous from an optimization perspective, as important properties such as convexity and lower-boundedness are lost. To circumvent this problem, we use the fact that typical loss functions in XC are convex surrogates of the 0-1 loss, and thus propose to switch to convex surrogates of its unbiased version. These surrogates are further adapted to the label imbalance by combining with label-frequency-based rebalancing.We show that the proposed loss functions can be easily incorporated into various different frameworks for extreme classification. This includes (i) linear classifiers, such as DiSMEC, on sparse input data representation, (ii) attention-based deep architecture, Atten-tionXML, learnt on dense Glove embeddings, and (iii) XLNet-based transformer model for extreme classification, APLC-XLNet. Our results demonstrate consistent improvements over the respective vanilla baseline models, on the propensity-scored metrics for precision and nDCG. This paper is published under the Creative Commons Attribution 4.0 International (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution.

show abstract

Accelerating Extreme Classification via Adaptive Feature Agglomeration

Cited by 18 publications

References 2 publications

Learning with Holographic Reduced Representations

Learning with Holographic Reduced Representations

Bonsai: diverse and shallow trees for extreme multi-label classification

Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels

Contact Info

Product

Resources

About