We present the first large-scale English "allwords lexical substitution" corpus. The size of the corpus provides a rich resource for investigations into word meaning. We investigate the nature of lexical substitute sets, comparing them to WordNet synsets. We find them to be consistent with, but more fine-grained than, synsets. We also identify significant differences to results for paraphrase ranking in context reported for the SEMEVAL lexical substitution data. This highlights the influence of corpus construction approaches on evaluation results.
The psychological community frequently investigates semantic norms of properties produced by native speakers after being presented concept words, and these norms are of great value for a wide variety of psychological experiments. This paper presents a new set of norms that includes a collection of properties from a production experiment for the German and the Italian languages. Stimuli consisted of 50 concrete objects taken from 10 different concept classes. The data comprise annotations of semantic relation types and several statistical measures, which facilitate the comparison of the two target languages.
Providing sets of semantically related words in the lexical entries of an electronic dictionary should help language learners quickly understand the meaning of the target words. Relational information might also improve memorisation, by allowing the generation of structured vocabulary study lists. However, an open issue is which semantic relations are cognitively most salient, and should therefore be used for dictionary construction. In this paper, we present a concept description elicitation experiment conducted with German and Italian speakers. The analysis of the experimental data suggests that there is a small set of concept-class-dependent relation types that are stable across languages and robust enough to allow discrimination across broad concept domains. Our further research will focus on harvesting instantiations of these classes from corpora.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.