Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2020
DOI: 10.1145/3394486.3403151
|View full text |Cite
|
Sign up to set email alerts
|

Correlation Networks for Extreme Multi-label Text Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 35 publications
(14 citation statements)
references
References 16 publications
0
14
0
Order By: Relevance
“…• Star-Transformer sparsifies the fully connected attention in the Transformer to a star-shaped structure. • BERTXML (Xun et al, 2020) Evaluation Metrics Two widely used metrics, precision at top k (P @k) and Normalized Discounted Cumulative Gains at top k (nDCG@k), are used to evaluate the model performance 4 .…”
Section: Methodsmentioning
confidence: 99%
“…• Star-Transformer sparsifies the fully connected attention in the Transformer to a star-shaped structure. • BERTXML (Xun et al, 2020) Evaluation Metrics Two widely used metrics, precision at top k (P @k) and Normalized Discounted Cumulative Gains at top k (nDCG@k), are used to evaluate the model performance 4 .…”
Section: Methodsmentioning
confidence: 99%
“…When the label space is large (e.g., 10K), one [CLS] token (e.g., a 100-dimensional vector) may not be informative enough to predict the relevant labels. Therefore, following [58], we put multiple [CLS] tokens [CLS 1 ], ..., [CLS 𝐶 ] in the input. To summarize, given a document 𝑑, the layer input 𝑯 is…”
Section: Transformer Layersmentioning
confidence: 99%
“…Considering the sparsity of labels, a short-ranked list of potentially relevant labels for each testing document is commonly used to represent classification quality. Following previous studies on extreme multi-label text classification [27,58,63], we adopt two rank-based metrics: the precision at top 𝑘 (P@𝑘) and the normalized Discounted Cumulative Gain at top 𝑘 (NDCG@𝑘), where 𝑘 = 1, 3, 5. For a document 𝑑, let 𝒚 𝑑 ∈ {0, 1} | L | be its ground truth label vector and rank(𝑖) be the index of the 𝑖-th highest predicted label according to the output probability 𝛑 𝑑 .…”
Section: Experiments 41 Setupmentioning
confidence: 99%
“…propose a multi-label reasoner mechanism that employs multiple rounds of predictions, and relies on predicting multiple rounds of results to ensemble or determine a proper order, which is computationally expensive. CorNet-BertXML (Xun et al, 2020) utilizes BERT (Devlin et al, 2019) to obtain the joint representation of text and all candidate labels and extra exponential linear units (ELU) at the prediction layer to make use of label correlation knowledge. Different from the above works, we exploit extra label co-occurrence prediction tasks to explicitly model the label correlations in a multi-task framework.…”
Section: Label Correlation Learningmentioning
confidence: 99%