2019
DOI: 10.3389/frai.2019.00015
|View full text |Cite
|
Sign up to set email alerts
|

Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology

Abstract: The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within dialectology/dialectometry. First, rather than assuming a fixed and incomplete set of variants, we use Computational Construction Grammar to provide a replicable and falsifiable set of syntactic features. Second, rather than assuming a specific area of interest, we use global language mapping base… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 17 publications
(11 citation statements)
references
References 54 publications
(75 reference statements)
1
10
0
Order By: Relevance
“…The inverse of this generalization is that individuals have unique or idiosyncratic constructions which are only revealed when the training corpus is centered around that individual. This finding fits well with studies in variation (Dunn, 2019b), Dunn2019a which reveal the high degree of syntactic differences across speech com- munities.…”
Section: Experiments 3 Perception Vs Production In Grammar Similaritysupporting
confidence: 90%
“…The inverse of this generalization is that individuals have unique or idiosyncratic constructions which are only revealed when the training corpus is centered around that individual. This finding fits well with studies in variation (Dunn, 2019b), Dunn2019a which reveal the high degree of syntactic differences across speech com- munities.…”
Section: Experiments 3 Perception Vs Production In Grammar Similaritysupporting
confidence: 90%
“…Variation within and between both datasets is structured more around individual languages and is less predictable given country-specific population and corpus size information. Work based on previous versions of the corpus (Dunn, 2019a(Dunn, , 2019b have shown that meaningful dialectal variation can be modeled using this source of data. The internal (corpus similarity) and external (demographic) evaluations in this paper strongly suggest that future work based on these expanded country-language sub-corpora will support further advances in corpus-based dialectology.…”
Section: Discussionmentioning
confidence: 99%
“…The grammar induction algorithm used here employs an association-based beam search to identify the best sequences of slot-constraints (Dunn, 2019a). While a grammar formalism like dependency grammar (Nivre and McDonald, 2008;Zhang and Nivre, 2012) must identify the head and attachment type for each word, a construction grammar must identify the representation type for each slot-constraint.…”
Section: Methods: Computational Cxgmentioning
confidence: 99%
“…However, because these two types of representations operate at different levels of complexity, it is possible that they grow at different rates. We thus experiment with the growth of a computational construction grammar (Dunn, 2018b(Dunn, , 2019a across data drawn from six different registers: news articles, Wikipedia articles, web pages, tweets, academic papers, and published books. These experiments are needed to establish a baseline relationship between the grammar and the lexicon for the experiments to follow.…”
mentioning
confidence: 99%