A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Khodak, Mikhail; Saunshi, Nikunj; Liang, Yingyu; Ma, Tengyu; Stewart, Brandon; Arora, Sanjeev

doi:10.18653/v1/p18-1002

Cited by 81 publications

(123 citation statements)

References 32 publications

Supporting

Mentioning

110

Contrasting

Order By: Relevance

“…where c = C∈C |C ∩ V| is the total number of words in C for which embeddings exist. In accordance with results reported by Khodak et al (2018), we found it helpful to apply a linear transformation to the so-obtained embedding, resulting in the final context embeddinĝ…”

Section: The Form-context Modelsupporting

confidence: 89%

“…Our approach is able to generate embeddings for OOV words even from only a single observation with high accuracy in many cases and outperforms previous work on the Definitional Nonce dataset (Herbelot and Baroni 2017) and the Contextual Rare Words dataset (Khodak et al 2018). To the best of our knowledge, this is the first work that jointly uses surface-form and context information to obtain representations for novel words.…”

Section: Introductionmentioning

confidence: 84%

“…In contrast to the sentences of the DN dataset, however, they are sampled randomly from the Westbury Wikipedia Corpus (WWC) (Shaoul and Westbury 2010) and, accordingly, do not have a definitional character in many cases. Khodak et al (2018) also provide a set of 300-dimensional word embeddings which, again, can be used to train our model. We may then compare the similarities of the so-obtained embeddings with the given similarity scores.…”

Section: The Form-context Modelmentioning

confidence: 99%

“…Importantly, both embeddings and their composition function are learned jointly, allowing each embedding to rely on its counterpart whenever its available information is not sufficient. In a similar fashion to work by Pinter, Guthrie, and Eisenstein (2017) and Khodak et al (2018), our approach is not trained from scratch, but instead makes use of preexisting word embeddings and aims to reconstruct these embeddings. This allows for a much faster learning process and enables us to easily combine our approach with any existing word embedding model, regardless of its internal structure.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context

Schick

Schütze

2019

AAAI

View full text Add to dashboard Cite

Word embeddings are a key component of high-performing natural language processing (NLP) systems, but it remains a challenge to learn good representations for novel words on the fly, i.e., for words that did not occur in the training data. The general problem setting is that word embeddings are induced on an unlabeled training corpus and then a model is trained that embeds novel words into this induced embedding space. Currently, two approaches for learning embeddings of novel words exist: (i) learning an embedding from the novel word's surface-form (e.g., subword n-grams) and (ii) learning an embedding from the context in which it occurs. In this paper, we propose an architecture that leverages both sources of information -surface-form and context -and show that it results in large increases in embedding quality. Our architecture obtains state-of-the-art results on the Definitional Nonce and Contextual Rare Words datasets. As input, we only require an embedding set and an unlabeled corpus for training our architecture to produce embeddings appropriate for the induced embedding space. Thus, our model can easily be integrated into any existing NLP system and enhance its capability to handle novel words.

show abstract

Section: The Form-context Modelsupporting

confidence: 89%

Section: Introductionmentioning

confidence: 84%

Section: The Form-context Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context

Schick

Schütze

2019

AAAI

View full text Add to dashboard Cite

show abstract

“…However, the method comes with a potential limitation: for each latent feature taking form as a PC of the word vectors, ABTT either completely removes the feature or keeps it intact. For this reason, Khodak et al (2018) argued that ABTT is liable either to not remove enough noise or to cause too much information loss. The objective of this paper is to address the limitations of ABTT.…”

Section: Post-processing Word Vectors Via Conceptor Negationmentioning

confidence: 99%

Unsupervised Post-Processing of Word Vectors via Conceptor Negation

Liu

Ungar

Sedoc

2019

AAAI

View full text Add to dashboard Cite

Word vectors are at the core of many natural language processing tasks. Recently, there has been interest in postprocessing word vectors to enrich their semantic information. In this paper, we introduce a novel word vector postprocessing technique based on matrix conceptors (Jaeger 2014), a family of regularized identity maps. More concretely, we propose to use conceptors to suppress those latent features of word vectors having high variances. The proposed method is purely unsupervised: it does not rely on any corpus or external linguistic database. We evaluate the post-processed word vectors on a battery of intrinsic lexical evaluation tasks, showing that the proposed method consistently outperforms existing state-of-the-art alternatives. We also show that post-processed word vectors can be used for the downstream natural language processing task of dialogue state tracking, yielding improved results in different dialogue domains.

show abstract