This article proposes a methodology for addressing three long-standing problems of near synonym research. First, we show how the internal structure of a group of near synonyms can be revealed. Second, we deal with the problem of distinguishing the subclusters and the words in those subclusters from each other. Finally, we illustrate how these results identify the semantic properties that should be mentioned in lexicographic entries. We illustrate our methodology with a case study on nine near synonymous Russian verbs that, in combination with an infinitive, express TRY.Our approach is corpus-linguistic and quantitative: assuming a strong correlation between semantic and distributional properties, we analyze 1,585 occurrences of these verbs taken from the Amsterdam Corpus and the Russian National Corpus, supplemented where necessary with data from the Web. We code each particular instance in terms of 87 variables (a.k.a. ID tags), i. e., morphosyntactic, syntactic and semantic characteristics that form a verb's behavioral profile. The resulting co-occurrence table is evaluated by means of a hierarchical agglomerative cluster analysis and additional quantitative methods. The results show that this behavioral profile approach can be used (i) to elucidate the internal structure of the group of near synonymous verbs and present it as a radial network structured around a prototypical member and (ii) to make explicit the scales of variation along which the near synonymous verbs vary.
The goal of the present study is to understand the role orthographic and semantic information play in the behaviour of skilled readers. Reading latencies from a self-paced sentence reading experiment in which Russian near-synonymous verbs were manipulated appear well-predicted by a combination of bottom-up sub-lexical letter triplets (trigraphs) and top-down semantic generalizations, modelled using the Naive Discrimination Learner. The results reveal a complex interplay of bottom-up and top-down support from orthography and semantics to the target verbs, whereby activations from orthography only are modulated by individual differences. Using performance on a serial reaction time task for a novel operationalization of the mental speed hypothesis, we explain the observed individual differences in reading behaviour in terms of the exploration/exploitation hypothesis from Reinforcement Learning, where initially slower and more variable behaviour leads to better performance overall.
Linguistic convention typically allows speakers several options. Evidence is accumulating that the various options are preferred in different contexts, yet the criteria governing the selection of the appropriate form are often far from obvious. Most researchers who attempt to discover the factors determining a preference rely on the linguistic analysis and statistical modeling of data extracted from large corpora. In this paper, we address the question of how to evaluate such models and explicitly compare the performance of a statistical model derived from a corpus with that of native speakers in selecting one of six Russian TRY verbs. Building on earlier work we trained a polytomous logistic regression model to predict verb choice given the sentential context. We compare the predictions the model makes for 60 unseen sentences to the choices adult native speakers make in those same sentences. We then look in more detail at the interplay of the contextual properties and model computationally how individual differences in assessing the importance of contextual properties may impact the linguistic knowledge of native speakers. Finally, we compare the probability the model assigns to encountering each of the six verbs in the 60 test sentences to the acceptability ratings the adult native speakers give to those sentences. We discuss the implications of our findings for both usage-based theory and empirical linguistic methodology.
Over the past 10 years, Cognitive Linguistics has taken a Quantitative Turn. Yet, concerns have been raised that this preoccupation with quantification and modelling may not bring us any closer to understanding how language works. We show that this objection is unfounded, especially if we rely on modelling techniques based on biologically and psychologically plausible learning algorithms. These make it possible to take a quantitative approach, while generating and testing specific hypotheses that will advance our understanding of how knowledge of language emerges from exposure to usage.
Over the past four decades, two distinct alternatives have emerged to rule-based models of how linguistic categories are stored and represented as cognitive structures, namely the prototype and exemplar theories. Although these models were initially thought to be mutually exclusive, shifts from one mechanism to the other have been observed in category learning experiments, bringing the models closer together. In this paper we implement a technique akin to varying abstraction modelling, that assumes intermediate abstraction processes to underlie category representations and categorization decisions; we do so using familiar statistical techniques such as regression and clustering that track frequency distributions in input. With this model we simulate, on the basis of actual usage of Russian try verbs and Finnish think verbs as observed in corpora, how prototypes for near-synonymous verbs could be formed from concrete exemplars at different levels of abstraction.In so doing, we take a closer look at the cognitive linguistic flirtation with multiple categorization theories, suggesting three improvements anchored in the fact that cognitive linguistics is a usage-based theory of language. Firstly, we show that language provides support for considering single prototype and full exemplar models as opposite ends along a continuum of abstraction. Secondly, we present a methodology that simulates how prototypes can be obtained from exemplars at more than one level of abstraction in a systematic and verifiable way. And thirdly, we illustrate our claims on the basis of work on verbs, denoting intangible events that are neither stable in nor independent of time and express relational concepts; this implies that verbs are more susceptible to their meanings being influenced by the concepts they relate.
A number of studies report that frequency is a poor predictor of acceptability, in particular at the lower end of the frequency spectrum. Because acceptability judgments provide a substantial part of the empirical foundation of dominant linguistic traditions, understanding how acceptability relates to frequency, one of the most robust predictors of human performance, is crucial. The relation between low frequency and acceptability is investigated using corpus- and behavioral data on the distribution of infinitival and finite that-complements in Polish. Polish verbs exhibit substantial subordination variation and for the majority of verbs taking an infinitival complement, the that-complement occurs with low frequency (<0.66 ipm). These low-frequency that-clauses, in turn, exhibit large differences in how acceptable they are to native speakers. It is argued that acceptability judgments are based on configurations of internally structured exemplars, the acceptability of which cannot reliably be assessed until sufficient evidence about the core component has accumulated.
We investigated the relation between implicit sequence learning and individual differences in working memory (WM) capacity. Participants performed an oculomotor version of the serial reaction time (SRT) task and three computerized WM tasks. Implicit learning was measured using anticipation measures only, as they represent strong indicators of learning. Our results demonstrate that anticipatory behavior in the SRT task changes as a function of WM capacity, such that it increases with decreased WM capacity. On the other hand, WM capacity did not affect the overall number of correct anticipations in the task. In addition, we report a positive relation between WM capacity and the number of consecutive correct anticipations (or chunks), and a negative relation between WM capacity and the overall number of errors, indicating different learning strategies during implicit sequence learning. The results of the current study are theoretically important because they demonstrate that individual differences in WM capacity could account for differences in learning processes, and ultimately change individuals' anticipatory behavior, even when learning is implicit, without intention and awareness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.