2021
DOI: 10.48550/arxiv.2101.03305
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Abstract: Extreme Multi-label text Classification (XMC) is a task of finding the most relevant labels from a large label set. Nowadays deep learning-based methods have shown significant success in XMC. However, the existing methods (e.g., Atten-tionXML and X-Transformer etc) still suffer from 1) combining several models to train and predict for one dataset, and 2) sampling negative labels statically during the process of training label ranking model, which reduces both the efficiency and accuracy of the model. To addres… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(19 citation statements)
references
References 16 publications
0
14
0
Order By: Relevance
“…Although previous works leverage negative sampling to alleviate the problem (Jiang et al, 2021;Chang et al, 2020), we argue that it is important to initialize the label embedding with the label side information.…”
Section: Rethinking Dense and Sparse Xmtcmentioning
confidence: 99%
See 3 more Smart Citations
“…Although previous works leverage negative sampling to alleviate the problem (Jiang et al, 2021;Chang et al, 2020), we argue that it is important to initialize the label embedding with the label side information.…”
Section: Rethinking Dense and Sparse Xmtcmentioning
confidence: 99%
“…The XML-CNN (Liu et al, 2017) and SLICE (Jain et al, 2019) employ the convolutional neural network on word embeddings for document representation. More recently, X-Transformer (Chang et al, 2020), LightXML (Jiang et al, 2021) and APLC-XLNet tames large pre-trained Transformer models to encode the input document into a fixed vector. AttentionXML (You et al, 2018) applies a label-word attention mechanism to calculate label-aware document embeddings, but it requires more computational cost proportional to the document length.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…This problem is also present for other machine learning techniques that do self-supervision through contrastive learning in different domains such as computer vision, natural language and graphs [21,52,70]. For example the well-known word2vec [38] word embedding technique randomly samples words that are not relevant for the context (other words in the sentence) to distinguish from the actual word that is part of the context.…”
Section: Negative Sampling For Rankingmentioning
confidence: 99%