LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Jiang, Ting; Wang, Deqing; Sun, Leilei; Yang, Huayi; Zhao, Zhengyang; Zhuang, Fuzhen

doi:10.1609/aaai.v35i9.16974

Cited by 69 publications

(51 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For Wikipedia-500K, Amazon-670K, and Amazon-3M, we use the same experimental setup (i.e. raw input text, sparse features and train-test split) as existing deep XMC methods [31,33,18,7]. For LF-AmazonTitles-131K, we use the experimental setup provided in the extreme classification repository [5].…”

Section: Resultsmentioning

confidence: 99%

“…BERT-OvA-1 Method P@1 P@3 P@5 R@10 R@20 R@100 P@1 P@3 P@5 R@10 R@20 R@100 Amazon-670K LF-AmazonTitles-131K Comparison on XMC benchmarks Table 1 compares our method with leading XMC methods such as DiSMEC [2], Parabel [25], XR-Linear [32], Bonsai [19], Slice [16], Astec [9], GlaS [13], AttentionXML [31], LightXML [18], XR-Transformer [33], and Overlap-XMC [22]. Most baseline results are obtained from their respective papers when available and otherwise taken from results reported in [31,33] and extreme classification repository [5].…”

Section: Methodsmentioning

confidence: 99%

“…Partition based methods: Many XMC methods such as Parabel [25], Bonsai [19], XR-Linear [32], AttentionXML [31], X-Transformer [28], XR-Transformer [33], LightXML [18] follow this approach where the label space is partitioned into a small number of mutually exclusive clusters, and then an ML model is learned to route a given instance to a few relevant clusters. A popular way to construct clusters is to perform balanced k-means clustering recursively using some pre-defined input features.…”

Section: Related Workmentioning

confidence: 99%

“…More specifically, ŷ = I(x) is a sparse real valued vector with only K ( L) non-zero entries and ŷ = 0 implies that label is shortlisted for input x with shortlist relevance score ŷ . As illustrated in Figure 1, many partition based methods [18,33] formulate their index as a label tree derived by hierarchically partitioning the label space into C clusters and then learn classifier vectors ŴC = [ ŵc ] C c=1 ( ŵc ∈ R D ) for each cluster which is used to select only a few clusters for a given input. More specifically, given the input x, the relevance of cluster c to input x is quantified by cluster relevance scores ŝc = ŵT c φ(x).…”

Section: Elias: End-to-end Learning To Index and Searchmentioning

confidence: 99%

“…There are two main formulations of the search index: 1) partition-based approach [25,31,7,18,32] and 2) approximate nearest neighbor search (ANNS) based approach [16,9,13,10]. In partitionbased approach, labels are first arranged into a tree-based index by partitioning the label space into mutually exclusive clusters and then a ML model is learned to route a given instance to a few relevant clusters.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

End-to-End Learning to Index and Search in Large Output Spaces

Gupta¹,

Chen²,

Yu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Extreme multi-label classification (XMC) is a popular framework for solving many real-world problems that require accurate prediction from a very large number of potential output choices. A popular approach for dealing with the large label space is to arrange the labels into a shallow tree-based index and then learn an ML model to efficiently search this index via beam search. Existing methods initialize the tree index by clustering the label space into a few mutually exclusive clusters based on pre-defined features and keep it fixed throughout the training procedure. This approach results in a sub-optimal indexing structure over the label space and limits the search performance to the quality of choices made during the initialization of the index. In this paper, we propose a novel method ELIAS which relaxes the tree-based index to a specialized weighted graph-based index which is learned end-to-end with the final task objective. More specifically, ELIAS models the discrete cluster-to-label assignments in the existing tree-based index as soft learnable parameters that are learned jointly with the rest of the ML model. ELIAS achieves state-of-the-art performance on several large-scale extreme classification benchmarks with millions of labels. In particular, ELIAS can be up to 2.5% better at precision@1 and up to 4% better at recall@100 than existing XMC methods.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Elias: End-to-end Learning To Index and Searchmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

End-to-End Learning to Index and Search in Large Output Spaces

Gupta¹,

Chen²,

Yu³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

ADAM: An Attentional Data Augmentation Method for Extreme Multi-label Text Classification

Zhang

Liu

Chen

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Foundation Models for Information Extraction

Paaß

Giesselbach

2023

Artificial Intelligence: Foundations, Theory, and Algorithms

View full text Add to dashboard Cite

In the chapter we consider Information Extraction approaches that automatically identify structured information in text documents and comprise a set of tasks. The Text Classification task assigns a document to one or more pre-defined content categories or classes. This includes many subtasks such as language identification, sentiment analysis, etc. The Word Sense Disambiguation task attaches a predefined meaning to each word in a document. The Named Entity Recognition task identifies named entities in a document. An entity is any object or concept mentioned in the text and a named entity is an entity that is referred to by a proper name. The Relation Extraction task aims to identify the relationship between entities extracted from a text. This covers many subtasks such as coreference resolution, entity linking, and event extraction. Most demanding is the joint extraction of entities and relations from a text. Traditionally, relatively small Pre-trained Language Models have been fine-tuned to these task and yield high performance, while larger Foundation Models achieve high scores with few-shot prompts, but usually have not been benchmarked.

show abstract

LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Cited by 69 publications

References 8 publications

End-to-End Learning to Index and Search in Large Output Spaces

End-to-End Learning to Index and Search in Large Output Spaces

ADAM: An Attentional Data Augmentation Method for Extreme Multi-label Text Classification

Foundation Models for Information Extraction

Contact Info

Product

Resources

About