GILE: A Generalized Input-Label Embedding for Text Classification

Pappas, Nikolaos; Henderson, James

doi:10.1162/tacl_a_00259

Cited by 60 publications

(35 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are three types of similarity learning in NLP. The supervised paradigm differs from typical supervised learning in that training examples are cast into pairwise constraints (Yang and Jin, 2006), as in cross-lingual word embedding learning based on word-level alignments (Faruqui and Dyer, 2014) and zero-shot utterance/document classification (Yazdani and Henderson, 2015;Nam et al, 2016;Pappas and Henderson, 2019) based on utterance/document-level annotations. The unsupervised paradigm aims to learn an underlying low-dimensional space where the relationships between most of the observed data are preserved, as in word embedding learning (Collobert et al, 2011;Mikolov et al, 2013;Pennington et al, 2014;Levy and Goldberg, 2014).…”

Section: Plagiarism Detectionmentioning

confidence: 99%

Multilevel Text Alignment with Cross-Document Attention

Zhou

Pappas

Smith

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

Text alignment finds application in tasks such as citation recommendation and plagiarism detection. Existing alignment methods operate at a single, predefined level and cannot learn to align texts at, for example, sentence and document levels. We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component, enabling structural comparisons across different levels (document-to-document and sentence-to-document). Our component is weakly supervised from document pairs and can align at multiple levels. Our evaluation on predicting document-to-document relationships and sentence-to-document relationships on the tasks of citation recommendation and plagiarism detection shows that our approach outperforms previously established hierarchical, attention encoders based on recurrent and transformer contextualization that are unaware of structural correspondence between documents.

show abstract

Section: Plagiarism Detectionmentioning

confidence: 99%

Multilevel Text Alignment with Cross-Document Attention

Zhou

Pappas

Smith

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…EX-AM (Du et al, 2018) introduces the interaction mechanism to incorporate word-level matching signals into the text classification task. GILE (Pappas and Henderson, 2019) proposes a joint inputlabel embedding model for neural text classification. Unfortunately, they cannot work well when there is no big difference between label texts.…”

Section: Related Workmentioning

confidence: 99%

“…However, most of them only focus on document representation but ignore the correlation among labels. Recently, some methods including DXML , EX-AM (Du et al, 2018), SGM (Yang et al, 2018), GILE (Pappas and Henderson, 2019) are proposed to capture the label correlations by exploiting label structure or label content. Although they obtained promising results in some cases, they still cannot work well when there is no big difference between label texts (e.g., the categories Management vs Management moves in Reuters News), which makes them hard to distinguish.…”

Section: Introductionmentioning

confidence: 99%

Label-Specific Document Representation for Multi-Label Text Classification

Xiao¹,

Huang²,

Chen³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

100

View full text Add to dashboard Cite

Multi-label text classification (MLTC) aims to tag most relevant labels for the given document. In this paper, we propose a Label-Specific Attention Network (LSAN) to learn the new document representation. LSAN takes advantage of label semantic information to determine the semantic connection between labels and document for constructing labelspecific document representation. Meanwhile, the self-attention mechanism is adopted to identify the label-specific document representation from document content information. In order to seamlessly integrate the above two parts, an adaptive fusion strategy is designed, which can effectively output the comprehensive document representation to build multilabel text classifier. Extensive experimental results on four benchmark datasets demonstrate that LSAN consistently outperforms the stateof-the-art methods, especially on the prediction of low-frequency labels. The code and hyper-parameter settings are released to facilitate other researchers 1 .

show abstract

“…Compared to previous joint input-label models our model is more flexible and not restricted to linear mappings, which have limited expressivity, but uses non-linear mappings modeled similar to energy-based learning networks (Belanger and McCallum, 2016). Perhaps, the most similar embedding model to ours is the one by (Pappas and Henderson, 2018), except for the linear scaling unit which is specific to sigmoidal linear units designed for multi-label classification problems and not for structured prediction, as here.…”

Section: Related Workmentioning

confidence: 99%

Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation

Pappas¹,

Werlen²,

Henderson³

2018

Proceedings of the Third Conference on Machine Translation: Research Papers

Self Cite

View full text Add to dashboard Cite

Tying the weights of the target word embeddings with the target word classifiers of neural machine translation models leads to faster training and often to better translation quality. Given the success of this parameter sharing, we investigate other forms of sharing in between no sharing and hard equality of parameters. In particular, we propose a structure-aware output layer which captures the semantic structure of the output space of words within a joint input-output embedding. The model is a generalized form of weight tying which shares parameters but allows learning a more flexible relationship with input word embeddings and allows the effective capacity of the output layer to be controlled. In addition, the model shares weights across output classifiers and translation contexts which allows it to better leverage prior knowledge about them. Our evaluation on English-to-Finnish and English-to-German datasets shows the effectiveness of the method against strong encoder-decoder baselines trained with or without weight tying.

show abstract

GILE: A Generalized Input-Label Embedding for Text Classification

Cited by 60 publications

References 19 publications

Multilevel Text Alignment with Cross-Document Attention

Multilevel Text Alignment with Cross-Document Attention

Label-Specific Document Representation for Multi-Label Text Classification

Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation

Contact Info

Product

Resources

About