Corpus linguistics and naive discriminative learning

Baayen, R. Harald

doi:10.1590/s1984-63982011000200003

Cited by 52 publications

(51 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The authors suggest that "it does not really matter what exactly [language] learners track, as long as they track enough features" (Divjak et al 2016: 29). A similar point is made by Baayen (2011) who shows for a set of models that the overall accuracy is hardly affected by permuting the values of a single predictor. It seems to be the case that individual higher-level abstract features are not that important, which is likely due to the correlational structure of the predictor space (Baayen 2011: 306): any given feature or predictor is predictable from other features or predictors.…”

Section: Discussion: Corpus-based Predictions Vs Preferential Choicessupporting

confidence: 63%

Are corpus-based predictions mirrored in the preferential choices and ratings of native speakers? Predicting the alternation between the Estonian adessive case and the adposition <i>peal</i> ‘on’

Klavan

Veismann

2017

ESUKA-JEFUL

View full text Add to dashboard Cite

Abstract. Recent work in usage-based linguistics stresses the importance of combining corpus-based analyses with experimental studies. A number of studies have compared the performance of a corpus-based statistical model against the behaviour of native speakers in a linguistic experiment. The present paper takes this line of analysis further by combining corpus-based work with two sources of experimental data. A mixedeffects logistic regression model is fitted to the corpus data of the Estonian adessive case and the adposition peal 'on' in present-day written Estonian. In order to evaluate the goodness of the corpus-based model, its performance is compared to the behaviour of native speakers in a forced choice task and a rating task.

show abstract

Section: Discussion: Corpus-based Predictions Vs Preferential Choicessupporting

confidence: 63%

Are corpus-based predictions mirrored in the preferential choices and ratings of native speakers? Predicting the alternation between the Estonian adessive case and the adposition <i>peal</i> ‘on’

Klavan

Veismann

2017

ESUKA-JEFUL

View full text Add to dashboard Cite

show abstract

“…In other words, the naive discriminative reader is as a statistical classifier grounded in basic principles of human learning. Baayen (2011) shows, for a binary classification task, that the naive discriminative reader performs with a classification accuracy comparable to state-of-the-art classifiers such as generalized linear mixed models and support vector machines.…”

Section: Discussionmentioning

confidence: 95%

An amorphous model for morphological processing in visual comprehension based on naive discriminative learning.

Baayen

Milin

Đurđević

et al. 2011

Psychological Review

Self Cite

457

487

View full text Add to dashboard Cite

A two-layer symbolic network model based on the equilibrium equations of the Rescorla-Wagner model (Danks, 2003) is proposed. The study starts by presenting two experiments in Serbian, which reveal for sentential reading the inflectional paradigmatic effects previously observed by Milin, Filipović Durdević, and Moscoso del Prado Martín (2009) for unprimed lexical decision. The empirical results are successfully modeled without having to assume separate representations for inflections or data structures such as inflectional paradigms. In the next step, the same naive discriminative learning approach is pitted against a wide range of effects documented in the morphological processing literature. Frequency effects for complex words as well as for phrases (Arnon & Snider, 2010) emerge in the model without the presence of whole-word or whole-phrase representations. Family size effects Moscoso del Prado Martín, Bertram, Häikiö, Schreuder, & Baayen, 2004) emerge in the simulations across simple words, derived words, and compounds, without derived words or compounds being represented as such. It is shown that for pseudo-derived words no special morpho-orthographic segmentation mechanism as posited by Rastle, Davis, and New (2004) is required. The model also replicates the finding of Plag and Baayen (2009), that, on average, words with more productive affixes elicit longer response latencies, while at the same time predicting that productive affixes afford faster response latencies for new words. English phrasal paradigmatic effects modulating isolated word reading are reported and modelled, showing that the paradigmatic effects characterizing Serbian case inflection have cross-linguistic scope.

show abstract

“…It would be highly desirable to provide a systematic comparison of different ways to model the data; these need not necessarily be regression-based, as in this paper, but may also include less widely used techniques such as memory-based learning (Daelemans & Bosch 2005) or naïve discriminative learning (Baayen 2011). As for designing models of the genitive alternation specifically, with possessor animacy being such a crucial (and in some cases near-categorical) constraint it may be worth considering taking possessor animacy out of the regression models at a later step in the analysis in order to zoom in on the attributes of the variation grammars that are actually divergent and highly variable (Tagliamonte 2014 is a recent study that uses this technique; the idea goes back to Labov 1969: 729, who argues that inclusion of (near-)categorical contexts will obscure the real patterns of variation).…”

Section: Discussion and Directions For Future Researchmentioning

confidence: 99%

Spoken syntax in a comparative perspective: The dative and genitive alternation in varieties of English

et al. 2017

View full text Add to dashboard Cite

This paper introduces a new resource designed to facilitate the quantitative investigation of syntactic variation in spoken language from a comparative perspective. The datasets comprise homogeneously annotated collections of "interchangeable" (i.e. competing) genitive and dative variants in four varieties of English: American English, British English, Canadian English, and New Zealand English. To showcase the empirical potential of the data source, we present a suggestive analysis that investigates the extent to which the probabilistic grammar of genitive and dative variant choice differs across varieties. The statistical analysis reveals that while there are a number of subtle probabilistic contrasts between the regional varieties under study, there is overall a striking degree of cross-varietal homogeneity. We conclude by outlining directions for future research.

show abstract

Corpus linguistics and naive discriminative learning

Cited by 52 publications

References 27 publications

Are corpus-based predictions mirrored in the preferential choices and ratings of native speakers? Predicting the alternation between the Estonian adessive case and the adposition <i>peal</i> ‘on’

Are corpus-based predictions mirrored in the preferential choices and ratings of native speakers? Predicting the alternation between the Estonian adessive case and the adposition <i>peal</i> ‘on’

An amorphous model for morphological processing in visual comprehension based on naive discriminative learning.

Spoken syntax in a comparative perspective: The dative and genitive alternation in varieties of English

Contact Info

Product

Resources

About