We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word. To succeed on LAM-BADA, computational models cannot simply rely on local context, but must be able to keep track of information in the broader discourse. We show that LAMBADA exemplifies a wide range of linguistic phenomena, and that none of several state-ofthe-art language models reaches accuracy above 1% on this novel benchmark. We thus propose LAMBADA as a challenging test set, meant to encourage the development of new models capable of genuine understanding of broad context in natural language text.
Distributional semantic methods to approximate word meaning with context vectors have been very successful empirically, and the last years have seen a surge of interest in their compositional extension to phrases and sentences. We present here a new model that, like those of Coecke et al. (2010) and Baroni and Zamparelli (2010), closely mimics the standard Montagovian semantic treatment of composition in distributional terms. However, our approach avoids a number of issues that have prevented the application of the earlier linguistically-motivated models to full-fledged, real-life sentences. We test the model on a variety of empirical tasks, showing that it consistently outperforms a set of competitive rivals. Compositional distributional semanticsThe research of the last two decades has established empirically that distributional vectors for words obtained from corpus statistics can be used to represent word meaning in a variety of tasks (Turney and Pantel, 2010). If distributional vectors encode certain aspects of word meaning, it is natural to expect that similar aspects of sentence meaning can also receive vector representations, obtained compositionally from word vectors. Developing a practical model of compositionality is still an open issue, which we address in this paper. One approach is to use simple, parameterfree models that perform operations such as pointwise multiplication or summing (Mitchell and Lapata, 2008). Such models turn out to be surprisingly effective in practice (Blacoe and Lapata, 2012), but they have obvious limitations. For instance, symmetric operations like vector addition are insensitive to syntactic structure, therefore meaning differences encoded in word order are lost in composition: pandas eat bamboo is identical to bamboo eats pandas. Guevara (2010), Mitchell and Lapata (2010), Socher et al. (2011) and Zanzotto et al. (2010) generalize the simple additive model by applying structure-encoding operators to the vectors of two sister nodes before addition, thus breaking the inherent symmetry of the simple additive model. A related approach (Socher et al., 2012) assumes richer lexical representations where each word is represented with a vector and a matrix that encodes its interaction with its syntactic sister. The training proposed in this model estimates the parameters in a supervised setting. Despite positive empirical evaluation, this approach is hardly practical for generalpurpose semantic language processing, since it requires computationally expensive approximate parameter optimization techniques, and it assumes task-specific parameter learning whose results are not meant to generalize across tasks.
Logical negation is a challenge for distributional semantics, because predicates and their negations tend to occur in very similar contexts, and consequently their distributional vectors are very similar. Indeed, it is not even clear what properties a “negated” distributional vector should possess. However, when linguistic negation is considered in its actual discourse usage, it often performs a role that is quite different from straightforward logical negation. If someone states, in the middle of a conversation, that “This is not a dog,” the negation strongly suggests a restricted set of alternative predicates that might hold true of the object being talked about. In particular, other canids and middle-sized mammals are plausible alternatives, birds are less likely, skyscrapers and other large buildings virtually impossible. Conversational negation acts like a graded similarity function, of the sort that distributional semantics might be good at capturing. In this article, we introduce a large data set of alternative plausibility ratings for conversationally negated nominal predicates, and we show that simple similarity in distributional semantic space provides an excellent fit to subject data. On the one hand, this fills a gap in the literature on conversational negation, proposing distributional semantics as the right tool to make explicit predictions about potential alternatives of negated predicates. On the other hand, the results suggest that negation, when addressed from a broader pragmatic perspective, far from being a nuisance, is an ideal application domain for distributional semantic methods.
Corpus-based distributional semantic models capture degrees of semantic relatedness among the words of very large vocabularies, but have problems with logical phenomena such as entailment, that are instead elegantly handled by model-theoretic approaches, which, in turn, do not scale up. We combine the advantages of the two views by inducing a mapping from distributional vectors of words (or sentences) into a Boolean structure of the kind in which natural language terms are assumed to denote. We evaluate this Boolean Distributional Semantic Model (BDSM) on recognizing entailment between words and sentences. The method achieves results comparable to a state-of-the-art SVM, degrades more gracefully when less training data are available and displays interesting qualitative properties.
No abstract
This paper describes the SemEval 2018 Task 10 on Capturing Discriminative Attributes. Participants were asked to identify whether an attribute could help discriminate between two concepts. For example, a successful system should determine that urine is a discriminating feature in the word pair kidney,bone. The aim of the task is to better evaluate the capabilities of state of the art semantic models, beyond pure semantic similarity. The task attracted submissions from 21 teams, and the best system achieved a 0.75 F1 score.
Abstract. Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgements about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgements and datasets generated from lexical databases, no evaluation resources of this kind have been available for Russian to date. Our contribution addresses this problem. We present five language resources of different scale and purpose for Russian semantic relatedness, each being a list of triples (wordi, wordj, similarity ij ). Four of them are designed for evaluation of systems for computing semantic relatedness, complementing each other in terms of the semantic relation type they represent. These benchmarks were used to organise a shared task on Russian semantic relatedness, which attracted 19 teams. We use one of the best approaches identified in this competition to generate the fifth high-coverage resource, the first open distributional thesaurus of Russian. Multiple evaluations of this thesaurus, including a large-scale crowdsourcing study involving native speakers, indicate its high accuracy.
If lexical similarity is not enough to reliably assess how word vectors would perform on various specific tasks, we need other ways of evaluating semantic representations. We propose a new task, which consists in extracting semantic differences using distributional models: given two words, what is the difference between their meanings? We present two proof of concept datasets for this task and outline how it may be performed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.