Holistic corpus-based dialectology

Szmrecsanyi, Benedikt; Wolk, Christoph

doi:10.1590/s1984-63982011000200011

Cited by 30 publications

(10 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The resulting boxy shapes are in biology often interpreted as being indicative of horizontal gene transfer and in linguistics as suggesting language contact. We skip further technicalities and refer the reader to the introduction in Szmrecsanyi and Wolk (2011:574–577). Suffice it to say that we present neighbor-net diagrams without insisting on a strictly phylogenetic interpretation.…”

Section: Quantitative Analysismentioning

confidence: 99%

Mapping out particle placement in Englishes around the world: A study in comparative sociolinguistic analysis

Grafmiller

Szmrecsanyi

2018

Lang Var Change

Self Cite

View full text Add to dashboard Cite

This study explores variability in particle placement across nine varieties of English around the globe, utilizing data from the International Corpus of English and the Global Corpus of Web-based English. We introduce a quantitative approach for comparative sociolinguistics that integrates linguistic distance metrics and predictive modeling, and use these methods to examine the development of regional patterns in grammatical constraints on particle placement in World Englishes. We find a high degree of uniformity among the conditioning factors influencing particle placement in native varieties (e.g., British, Canadian, and New Zealand English), while English as a second language varieties (e.g., Indian and Singaporean English) exhibit a high degree of dissimilarity with the native varieties and with each other. We attribute the greater heterogeneity among second language varieties to the interaction between general L2 acquisition processes and the varying sociolinguistic contexts of the individual regions. We argue that the similarities in constraint effects represent compelling evidence for the existence of a shared variable grammar and variation among grammatical systems is more appropriately analyzed and interpreted as a continuum rather than multiple distinct grammars.

show abstract

Section: Quantitative Analysismentioning

confidence: 99%

Mapping out particle placement in Englishes around the world: A study in comparative sociolinguistic analysis

Grafmiller

Szmrecsanyi

2018

Lang Var Change

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, given thatas we have seenlanguage grouping in classificatory linguistics is intended to reflect systematic, pervasive change, researchers have increasingly questioned classifications that rely on linguistic traits selected a priori. While being a practical necessity in traditional comparative dialectology, the selection of a limited number of specific traits necessarily involves subjective judgements 2005;Starostin, 2010;Szmrecsanyi & Wolk, 2011), and may result in erroneous classifications as the pre-selected traits become overly influential in the final analysis. In keeping with this view, this paper aims to contribute to the development of an empirically-based classification of Gallo-Italic through the use of dialectometry applied to atlas corpora, and specifically through the measurement of Levenshtein distance.…”

Section: Classificatory Criteriamentioning

confidence: 99%

Revisiting the classification of Gallo-Italic: a dialectometric approach

Tamburelli

Brasca

2017

Digital Scholarship in the Humanities

View full text Add to dashboard Cite

While Gallo-Italic varieties clearly belong to the Romance language family, their subgrouping as either Gallo-Romance or Italo-Romance has been the source of disagreement in the classificatory literature. While earlier analyses tended to classify Gallo-Italic as Gallo-Romance (notably Schmid, 1956; Bec, 1970-1971), later work has either argued for or tacitly assumed a classification of Gallo-Italic as part of the Italo-Romance branch, a view that is both different from as well as irreconcilable with the earlier Gallo-Romance classifications. In this paper we aim to contribute to the development of an empirically-based classification of Gallo-Italic through the use of dialectometry applied to atlas corpora, and specifically through the measurement of Levenshtein distance. Using three wordlists (Swadesh 100, Swadesh 200, Leipzig-Jakarta) and comparing twenty-six linguistic varieties across Italy and southeastern France, we show that Gallo-Italic is best classified as a third subgroup within the Gallo-Romance branch. Our results also clearly identify all the major bundles of isoglosses established through traditional dialectological methods and confirm Gallo-Italic as a relatively homogenous group distinct from Italo-Romance.

show abstract

“…C orpus-based dialectometry (henceforth: CBDM), then, combines the study of dialectometric research questions with corpus-linguistic methodologies. CBDM utilizes aggregation methodologies to explore quantitative and distributional usage patterns extracted from dialect corpora (see Szmrecsanyi, 2008, 2011, 2013; Szmrecsanyi & Wolk, 2011; Wolk, 2014; Wolk & Szmrecsanyi, 2016). Turning to corpora enables analysts to address questions about usage versus knowledge, production/comprehension versus intuition, chaos versus orderliness, and so on.…”

Section: Introductionmentioning

confidence: 99%

Probabilistic corpus-based dialectometry

Wolk

Szmrecsanyi²

2018

J. of Ling. Geography

Self Cite

View full text Add to dashboard Cite

Researchers in dialectometry have begun to explore measurements based on fundamentally quantitative metrics, often sourced from dialect corpora, as an alternative to the traditional signals derived from dialect atlases. This change of data type amplifies an existing issue in the classical paradigm, namely that locations may vary in coverage and that this affects the distance measurements: pairs involving a location with lower coverage suffer from greater noise and therefore imprecision. We propose a method for increasing robustness using generalized additive modeling, a statistical technique that allows leveraging the spatial arrangement of the data. The technique is applied to data from the British English dialect corpus FRED; the results are evaluated regarding their interpretability and according to several quantitative metrics. We conclude that data availability is an influential covariate in corpus-based dialectometry and beyond, and recommend that researchers be aware of this issue and of methods to alleviate it.

show abstract

Holistic corpus-based dialectology

Cited by 30 publications

References 27 publications

Mapping out particle placement in Englishes around the world: A study in comparative sociolinguistic analysis

Mapping out particle placement in Englishes around the world: A study in comparative sociolinguistic analysis

Revisiting the classification of Gallo-Italic: a dialectometric approach

Probabilistic corpus-based dialectometry

Contact Info

Product

Resources

About