Words are Vectors, Dependencies are Matrices: Learning Word Embeddings from Dependency Graphs

Czarnowska, Paula; Emerson, Guy; Copestake, Ann

doi:10.18653/v1/w19-0408

Cited by 10 publications

(17 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For both context-based and hybrid few-shot learning, we have achieved a new state of the art on 4 out of the 6 evaluation tasks used, showing that a careful, optimised approach can be the key to success in few-shot learning. Future work could explore other distributional models, such as dependency embeddings Czarnowska et al, 2019), but it is clear from our results that careful optimisation will be required to adapt other models to the few-shot setting.…”

Section: Discussionmentioning

confidence: 99%

Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Hautte¹,

Emerson

Rei

2019

Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Self Cite

View full text Add to dashboard Cite

Word embeddings are an essential component in a wide range of natural language processing applications. However, distributional semantic models are known to struggle when only a small number of context sentences are available. Several methods have been proposed to obtain higher-quality vectors for these words, leveraging both this context information and sometimes the word forms themselves through a hybrid approach. We show that the current tasks do not suffice to evaluate models that use word-form information, as such models can easily leverage word forms in the training data that are related to word forms in the test data. We introduce 3 new tasks, allowing for a more balanced comparison between models. Furthermore, we show that hyperparameters that have largely been ignored in previous work can consistently improve the performance of both baseline and advanced models, achieving a new state of the art on 4 out of 6 tasks.

show abstract

Section: Discussionmentioning

confidence: 99%

Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Hautte¹,

Emerson

Rei

2019

Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Self Cite

View full text Add to dashboard Cite

show abstract

“…In both domains, the shared goals are: i) map entities v 2 V to embeddings e v where e 2 R |V|⇥n , n being the dimensionality of the vectors; ii) map relations r 2 R in one -or more -space R |R|⇥⇤ . In this work, we focus on constructing a syntactic dataset of positive training triples from a corpus as in Czarnowska et al (2019). All of the models we investigate rely on a negative sampling mechanism that generates a dataset D 0 of false triples.…”

Section: Theoretical Approachmentioning

confidence: 99%

“…Representing words in terms of their syntactic co-occurrences has been long proposed, both for count-based (Padó and Lapata, 2007;Weir et al, 2016), and neural (Hermann and Blunsom, 2013;Levy and Goldberg, 2014;Komninos and Manandhar, 2016;Czarnowska et al, 2019;Vashishth et al, 2019) models of word meaning. Tested on benchmark word similarity tasks, such models often perform favourably to models based on proximal co-occurrence, particularly when the similarity or substitutability of two words is considered rather than their relatedness (Levy and Goldberg, 2014).…”

Section: Introductionmentioning

confidence: 99%

“…There is of course an explosion in the number of parameters to be learnt in both DEP and EXT due to the many possible word-relation combinations which form the target vocabulary for these models (see Table 1). A possible solution, pro-posed by Czarnowska et al (2019), is the Dependency Matrix (DM) model which uses linear maps in the form of square matrices to encode relations. Here, the training objective is changed from predicting (target, context) pairs to (target, relation, context) triples, e.g., (rain,dobj,like).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Representing Syntax and Composition with Geometric Transformations

Bertolini

Weeds

Weir

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

The exploitation of syntactic graphs (SyGs) as a word's context has been shown to be beneficial for distributional semantic models (DSMs), both at the level of individual word representations and in deriving phrasal representations via composition. However, notwithstanding the potential performance benefit, the syntactically-aware DSMs proposed to date have huge numbers of parameters (compared to conventional DSMs) and suffer from data sparsity. Furthermore, the encoding of the SyG links (i.e., the syntactic relations) has been largely limited to linear maps. The knowledge graphs' literature, on the other hand, has proposed light-weight models employing different geometric transformations (GTs) to encode edges in a knowledge graph (KG). Our work explores the possibility of adopting this family of models to encode SyGs. Furthermore, we investigate which GT better encodes syntactic relations, so that these representations can be used to enhance phrase-level composition via syntactic contextualisation.

show abstract

“…Vector addition (Rimell et al, 2016) .496 .472 Simplified Practical Lexical Function (Rimell et al, 2016) .496 .497 Vector addition (Czarnowska et al, 2019) .485 .475 Dependency vector addition (Czarnowska et al, 2019) .497 .439 Semantic functions (Emerson and Copestake, 2017b) .20 .16 Sem-func & vector ensemble (Emerson and Copestake, 2017b Previous work has shown that vector addition performs well on this task (Rimell et al, 2016;Czarnowska et al, 2019). I have trained a Skipgram model (Mikolov et al, 2013) using the Gensim library (Řehůřek and Sojka, 2010), tuning weighted addition on the dev set.…”

Section: Previous Workmentioning

confidence: 99%

Autoencoding Pixies: Amortised Variational Inference with Graph Convolutions for Functional Distributional Semantics

Emerson¹

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Functional Distributional Semantics provides a linguistically interpretable framework for distributional semantics, by representing the meaning of a word as a function (a binary classifier), instead of a vector. However, the large number of latent variables means that inference is computationally expensive, and training a model is therefore slow to converge. In this paper, I introduce the Pixie Autoencoder, which augments the generative model of Functional Distributional Semantics with a graphconvolutional neural network to perform amortised variational inference. This allows the model to be trained more effectively, achieving better results on two tasks (semantic similarity in context and semantic composition), and outperforming BERT, a large pre-trained language model.

show abstract

Words are Vectors, Dependencies are Matrices: Learning Word Embeddings from Dependency Graphs

Cited by 10 publications

References 25 publications

Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Representing Syntax and Composition with Geometric Transformations

Autoencoding Pixies: Amortised Variational Inference with Graph Convolutions for Functional Distributional Semantics

Contact Info

Product

Resources

About