SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity

Gerz, Daniela; Vulić, Ivan; Hill, Felix; Reichart, Roi; Korhonen, Anna

doi:10.18653/v1/d16-1235

Cited by 183 publications

(214 citation statements)

References 31 publications

Supporting

Mentioning

202

Contrasting

Order By: Relevance

“…First we consider a group of word-level similarity datasets that are commonly used as benchmarks in previous research: WS-353-SIM (Finkelstein et al, 2001), YP-130 (Yang and Powers, 2005), SIMLEX-999 (Hill et al, 2015), SimVerb-3500 (Gerz et al, 2016), RW-STANFORD (Luong Table 1: Spearman's ρ on word similarity tasks for combinations of word vectors and the following similarity metrics: cosine similarity (COS), Pearson's r (PRS), Spearman's ρ (SPR), and Kendall τ (KEN). N indicates the proportion of sentence vectors in a task for which the null hypothesis of normality in a Shapiro-Wilk test was not rejected at α = 0.05.…”

Section: Methodsmentioning

confidence: 99%

Correlation Coefficients and Semantic Textual Similarity

Железняк¹,

Savkov

Shen³

et al. 2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

A large body of research into semantic textual similarity has focused on constructing state-of-the-art embeddings using sophisticated modelling, careful choice of learning signals and many clever tricks. By contrast, little attention has been devoted to similarity measures between these embeddings, with cosine similarity being used unquestionably in the majority of cases. In this work, we illustrate that for all common word vectors, cosine similarity is essentially equivalent to the Pearson correlation coefficient, which provides some justification for its use. We thoroughly characterise cases where Pearson correlation (and thus cosine similarity) is unfit as similarity measure. Importantly, we show that Pearson correlation is appropriate for some word vectors but not others. When it is not appropriate, we illustrate how common nonparametric rank correlation coefficients can be used instead to significantly improve performance. We support our analysis with a series of evaluations on word-level and sentencelevel semantic textual similarity benchmarks. On the latter, we show that even the simplest averaged word vectors compared by rank correlation easily rival the strongest deep representations compared by cosine similarity.

show abstract

Section: Methodsmentioning

confidence: 99%

Correlation Coefficients and Semantic Textual Similarity

Железняк¹,

Savkov

Shen³

et al. 2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

show abstract

“…A score close to 1 indicates an embedding close to the human judgement. We use MC-30 (Miller and Charles, 1991), MEN (Bruni et al, 2014), MTurk-287 (Radinsky et al, 2011), MTurk-771 (Halawi et al, 2012), RG-65 (Rubenstein and Goodenough, 1965), RW (Luong et al, 2013), SimVerb-3500 (Gerz et al, 2016), WordSim-353 (Finkelstein et al, 2001) and YP-130 (Yang and Powers, 2006) classic datasets. We follow the same protocol used by Word2vec and fastText by discarding pairs which contain a word that is not in our embedding.…”

Section: Word Similarity Evaluationmentioning

confidence: 99%

Dict2vec : Learning Word Embeddings using Lexical Dictionaries

Tissier¹,

Gravier²,

Habrard³

2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Learning word embeddings on large unlabeled corpus has been shown to be successful in improving many natural language tasks. The most efficient and popular approaches learn or retrofit such representations using additional external data. Resulting embeddings are generally better than their corpus-only counterparts, although such resources cover a fraction of words in the vocabulary. In this paper, we propose a new approach, Dict2vec, based on one of the largest yet refined datasource for describing words -natural language dictionaries. Dict2vec builds new word pairs from dictionary entries so that semantically-related words are moved closer, and negative sampling filters out pairs whose words are unrelated in dictionaries. We evaluate the word representations obtained using Dict2vec on eleven datasets for the word similarity task and on four datasets for a text classification task.

show abstract

“…(Elman, 2004). The new models go much further by capturing a considerable amount of variance of human word-to-word similarity ratings (e.g., Gerz, Vulić, Hill, Reichart, & Korhonen, 2016;Levy & Goldberg, 2014). Here are some similarity relations word2vec captures by simply attempting to predict words from surrounding words:…”

mentioning

confidence: 99%

From words-as-mappings to words-as-cues: the role of language in semantic knowledge

Lupyan

Lewis

2017

Language, Cognition and Neuroscience

View full text Add to dashboard Cite

Semantic knowledge (or semantic memory) is knowledge we have about the world. For example, we know that knives are typically sharp, made of metal, and that they are tools used for cutting. To what kinds of experiences do we owe such knowledge? Most work has stressed the role of direct sensory and motor experiences. Another kind of experience, considerably less well understood, is our experience with language. We review two ways of thinking about the relationship between language and semantic knowledge: (i) language as mapping onto independently-acquired concepts, and (ii) language as a set of cues to meaning. We highlight some problems with the words-as-mappings view, and argue in favor of the words-as-cues alternative. We then review some surprising ways that language impacts semantic knowledge, and discuss how distributional semantics models can help us better understand its role. We argue that language has an abstracting effect on knowledge, helping to go beyond concrete experiences which are more characteristic of perception and action. We conclude by describing several promising directions for future research. ROLE OF LANGUAGE IN SEMANTIC KNOWLEDGE 3

show abstract

SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity

Cited by 183 publications

References 31 publications

Correlation Coefficients and Semantic Textual Similarity

Correlation Coefficients and Semantic Textual Similarity

Dict2vec : Learning Word Embeddings using Lexical Dictionaries

From words-as-mappings to words-as-cues: the role of language in semantic knowledge

Contact Info

Product

Resources

About