Data scarcity, robustness and extreme multi-label classification

Babbar, Rohit; Schölkopf, Bernhard

doi:10.1007/s10994-019-05791-5

Cited by 91 publications

(73 citation statements)

References 34 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additional evidence that classes can be viewed as long-tailed mixtures of subpopulations comes from extreme multiclass problems. Specifically, these problems often have more than 10, 000 fine-grained labels and the number of examples per class is longtailed [3,4,21,30,49,51]. Observe that fine-grained labels in such problems correspond to subcategories of coarser classes (for example, different species of birds all correspond to the "bird" label in a coarse classification problem).…”

Section: Our Contributionmentioning

confidence: 99%

Does learning require memorization? a short tale about a long tail

Feldman

2020

Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing

196

151

View full text Add to dashboard Cite

State-of-the-art results on image recognition tasks are achieved using over-parameterized learning algorithms that (nearly) perfectly fit the training set and are known to fit well even random labels. This tendency to memorize seemingly useless training data labels is not explained by existing theoretical analyses. Memorization of the training data also presents significant privacy risks when the training data contains sensitive personal information and thus it is important to understand whether such memorization is necessary for accurate learning. We provide a simple conceptual explanation and a theoretical model demonstrating that for natural data distributions memorization of labels is necessary for achieving close-to-optimal generalization error. The model is motivated and supported by the results of several recent empirical works. In our model, data is sampled from a mixture of subpopulations and the frequencies of these subpopulations are chosen from some prior. The model allows to quantify the effect of not fitting the training data on the generalization performance of the learned classifier and demonstrates that memorization is necessary whenever frequencies are long-tailed. Image and text data are known to follow such distributions and therefore our results establish a formal link between these empirical phenomena. Our results also have concrete implications for the cost of ensuring differential privacy in learning. CCS CONCEPTS • Theory of computation → Models of learning; Sample complexity and generalization bounds.

show abstract

Section: Our Contributionmentioning

confidence: 99%

Does learning require memorization? a short tale about a long tail

Feldman

2020

Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing

196

151

View full text Add to dashboard Cite

show abstract

“…This algorithm can be derived as a special case (for k = 2) of the refinement step in the constrained clustering routine proposed in [Banerjee and Ghosh, 2006]. It is notable that [Prabhu et al, 2018] derive essentially the same algorithm for splitting labels into balanced cluster, but they derive their approach starting from a different graph flow-based approach to constrained clustering.…”

Section: A1 Defrag Implementation Detailsmentioning

confidence: 99%

Accelerating Extreme Classification via Adaptive Feature Agglomeration

Jalan

Kar

2019

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Extreme classification seeks to assign each data point, the most relevant labels from a universe of a million or more labels. This task is faced with the dual challenge of high precision and scalability, with millisecond level prediction times being a benchmark. We propose DEFRAG, an adaptive feature agglomeration technique to accelerate extreme classification algorithms. Despite past works on feature clustering and selection, DEFRAG distinguishes itself in being able to scale to millions of features, and is especially beneficial when feature sets are sparse, which is typical of recommendation and multi-label datasets. The method comes with provable performance guarantees and performs efficient task-driven agglomeration to reduce feature dimensionalities by an order of magnitude or more. Experiments show that DEFRAG can not only reduce training and prediction times of several leading extreme classification algorithms by as much as 40%, but also be used for feature reconstruction to address the problem of missing features, as well as offer superior coverage on rare labels.

show abstract

“…One-vs-rest Sometimes also referred to as binary relavance (Zhang et al 2018), these methods learn a classifier per label which distinguishes it from rest of the labels. In terms of prediction accuracy and label diversity, these methods have been shown to be among the best performing ones for XMC (Babbar and Schölkopf 2017;Yen et al 2017;Babbar and Schölkopf 2019). However, due to their reliance on a distributed training framework, it remains challenging to employ them in resource constrained environments.…”

Section: Related Workmentioning

confidence: 99%

Bonsai: diverse and shallow trees for extreme multi-label classification

2020

Self Cite

View full text Add to dashboard Cite

Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees. We show three concrete realizations of this label representation space including: (i) the input space which is spanned by the input features, (ii) the output space spanned by label vectors based on their co-occurrence with other labels, and (iii) the joint space by combining the input and output representations. Furthermore, the constraint-free multi-way partitions learnt iteratively in these spaces lead to shallow trees. By combining the effect of shallow trees and generalized label representation, Bonsai achieves the best of both worlds-fast training which is comparable to state-of-the-art tree-based methods in XMC, and much better prediction accuracy, particularly on tail-labels. On a benchmark Amazon-3M dataset with 3 million labels, Bonsai outperforms a state-of-the-art one-vs-rest method in terms of prediction accuracy, while being approximately 200 times faster to train. The code for Bonsai is available at https ://githu b.com/xmc-aalto /bonsa i.

show abstract

Data scarcity, robustness and extreme multi-label classification

Cited by 91 publications

References 34 publications

Does learning require memorization? a short tale about a long tail

Does learning require memorization? a short tale about a long tail

Accelerating Extreme Classification via Adaptive Feature Agglomeration

Bonsai: diverse and shallow trees for extreme multi-label classification

Contact Info

Product

Resources

About