2019
DOI: 10.1162/tacl_a_00259
|View full text |Cite
|
Sign up to set email alerts
|

GILE: A Generalized Input-Label Embedding for Text Classification

Abstract: Neural text classification models typically treat output labels as categorical variables which lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen ofte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
35
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 60 publications
(35 citation statements)
references
References 19 publications
0
35
0
Order By: Relevance
“…There are three types of similarity learning in NLP. The supervised paradigm differs from typical supervised learning in that training examples are cast into pairwise constraints (Yang and Jin, 2006), as in cross-lingual word embedding learning based on word-level alignments (Faruqui and Dyer, 2014) and zero-shot utterance/document classification (Yazdani and Henderson, 2015;Nam et al, 2016;Pappas and Henderson, 2019) based on utterance/document-level annotations. The unsupervised paradigm aims to learn an underlying low-dimensional space where the relationships between most of the observed data are preserved, as in word embedding learning (Collobert et al, 2011;Mikolov et al, 2013;Pennington et al, 2014;Levy and Goldberg, 2014).…”
Section: Plagiarism Detectionmentioning
confidence: 99%
“…There are three types of similarity learning in NLP. The supervised paradigm differs from typical supervised learning in that training examples are cast into pairwise constraints (Yang and Jin, 2006), as in cross-lingual word embedding learning based on word-level alignments (Faruqui and Dyer, 2014) and zero-shot utterance/document classification (Yazdani and Henderson, 2015;Nam et al, 2016;Pappas and Henderson, 2019) based on utterance/document-level annotations. The unsupervised paradigm aims to learn an underlying low-dimensional space where the relationships between most of the observed data are preserved, as in word embedding learning (Collobert et al, 2011;Mikolov et al, 2013;Pennington et al, 2014;Levy and Goldberg, 2014).…”
Section: Plagiarism Detectionmentioning
confidence: 99%
“…EX-AM (Du et al, 2018) introduces the interaction mechanism to incorporate word-level matching signals into the text classification task. GILE (Pappas and Henderson, 2019) proposes a joint inputlabel embedding model for neural text classification. Unfortunately, they cannot work well when there is no big difference between label texts.…”
Section: Related Workmentioning
confidence: 99%
“…However, most of them only focus on document representation but ignore the correlation among labels. Recently, some methods including DXML , EX-AM (Du et al, 2018), SGM (Yang et al, 2018), GILE (Pappas and Henderson, 2019) are proposed to capture the label correlations by exploiting label structure or label content. Although they obtained promising results in some cases, they still cannot work well when there is no big difference between label texts (e.g., the categories Management vs Management moves in Reuters News), which makes them hard to distinguish.…”
Section: Introductionmentioning
confidence: 99%
“…Compared to previous joint input-label models our model is more flexible and not restricted to linear mappings, which have limited expressivity, but uses non-linear mappings modeled similar to energy-based learning networks (Belanger and McCallum, 2016). Perhaps, the most similar embedding model to ours is the one by (Pappas and Henderson, 2018), except for the linear scaling unit which is specific to sigmoidal linear units designed for multi-label classification problems and not for structured prediction, as here.…”
Section: Related Workmentioning
confidence: 99%