“…At the moment we are investigating the distribution of the features responsible for the traditional division of sites in our data set. However, 2-and 3-fold divisions of sites can be asserted with high confidence, which was also found in our previous study of the same data set [29].…”
Section: • Fuse the Two Closest Pointssupporting
confidence: 87%
“…In this study we applied WPGMA in order to find grouping in the data. See [29] for a discussion of alternatives. WPGMA calculates the distance between the two clusters, i.e.…”
Section: • Fuse the Two Closest Pointsmentioning
confidence: 99%
“…Closer inspection of the MDS plot in Figure 4 also shows that this group of dialects has a particularly unclear border to the eastern dialects, which could explain the results of the noisy clustering applied to the whole data set. More detailed discussion of the instability of our data set can be found in [29].…”
The paper presents a computational analysis of Bulgarian dialect variation, concentrating on pronunciation differences. It describes the phonetic data set compiled during the project* ‘Measuring Linguistic Unity and Diversity in Europe’ that consists of the pronunciations of 157 words collected at 197 sites from all over Bulgaria. We also present the results of analyzing this data set using various quantitative methods and compare them to the traditional scholarship on Bulgarian dialects. The results have shown that various dialectometrical techniques clearly identify east-west division of the country along the ‘jat’ border, as well as the third group of varieties in the Rodopi area. The rest of the groups specified in the traditional atlases either were not confirmed or were confirmed with a low confidence.
“…At the moment we are investigating the distribution of the features responsible for the traditional division of sites in our data set. However, 2-and 3-fold divisions of sites can be asserted with high confidence, which was also found in our previous study of the same data set [29].…”
Section: • Fuse the Two Closest Pointssupporting
confidence: 87%
“…In this study we applied WPGMA in order to find grouping in the data. See [29] for a discussion of alternatives. WPGMA calculates the distance between the two clusters, i.e.…”
Section: • Fuse the Two Closest Pointsmentioning
confidence: 99%
“…Closer inspection of the MDS plot in Figure 4 also shows that this group of dialects has a particularly unclear border to the eastern dialects, which could explain the results of the noisy clustering applied to the whole data set. More detailed discussion of the instability of our data set can be found in [29].…”
The paper presents a computational analysis of Bulgarian dialect variation, concentrating on pronunciation differences. It describes the phonetic data set compiled during the project* ‘Measuring Linguistic Unity and Diversity in Europe’ that consists of the pronunciations of 157 words collected at 197 sites from all over Bulgaria. We also present the results of analyzing this data set using various quantitative methods and compare them to the traditional scholarship on Bulgarian dialects. The results have shown that various dialectometrical techniques clearly identify east-west division of the country along the ‘jat’ border, as well as the third group of varieties in the Rodopi area. The rest of the groups specified in the traditional atlases either were not confirmed or were confirmed with a low confidence.
“…Given two infl uence functions, it is a straightforward task to construct a corresponding membership function where the break-point corresponds to a value of 0.5 for the membership function." (GIRARD / LARMOUTH 1993, 112-113) 19 "Recent research has shown that cluster analysis should be applied with caution to dialect data [NERBONNE et al 2008;PROKIĆ / NERBONNE 2008]. Small differences in the input data can lead to substantially different clustering results.…”
Section: Faktorenanalyse Zur Identifi Kation Von Dialekttypenmentioning
VERDICHTUNGEN IM SPRACHGEOGRAFISCHEN KONTINUUM* * Dieser Beitrag stellt eine veränderte und erweiterte Fassung der Teile 2.3 und 5.2 der Dissertation des Autors (PICKL 2013) dar, die sich mit variablenübergreifenden Raumstrukturen beschäftigen. Erweiterungen bestehen im Wesentlichen in der Diskussion der Konzepte des Kontinuums und der Areale und im Vergleich des hier vorgeschlagenen Verfahrens und seiner Ergebnisse mit herkömmlichen Verfahren und traditionellen Einteilungen des Dialektraums Bayerisch-Schwaben.1 Angefangen mit der Isoglossenmethode, bei der die Außengrenzen der Verbreitungsgebiete einzelner sprachlicher Erscheinungen übereinandergelegt werden, um von Bündeln solcher Linien auf Dialektgrenzen zu schließen, bis zur modernen Clusteranalyse, die auf der Grundlage umfangreicher Datenmatrizen Ortsdialekte zu immer größeren Gruppen -und damit zu Dialektgebietenzusammenfasst, haben alle diese Verfahren die Vorstellung des in Dialektgebiete gegliederten Sprachraums gemein.
“…Cluster analysis partitions a set of objects into similar groups, such that distances within the group are minimized while distances between groups are maximized. Initially, researchers predominately applied hard-clustering methods to dialect data, such as Hierarchical Clustering (Goebl, 2008;Prokić et al, 2008;Scherrer et al, 2016;Szmrecsanyi, 2011) or k-means clustering (Lundberg, 2005). Hard-clustering assigns each object to a single group, generating clear-cut boundaries between groups.…”
In the early 2000s, the SADS, an extensive linguistic atlas project, surveyed more than three thousand individuals across German-speaking Switzerland on over two hundred linguistic variants, capturing the morphosyntactic variation in Swiss German. In this paper, we applied TESS, a Bayesian clustering method from evolutionary biology to the SADS to infer population structure, building on parallels between biology and linguistics that have recently been illustrated theoretically and explored experimentally. We tested three clustering models with different spatial assumptions: a nonspatial model, a spatial trend model with a spatial gradient, and a spatial full-trend model with both a spatial gradient and spatial-autocorrelation. Results reveal five distinct morphosyntactic populations, four of which correspond to traditional Swiss German dialect regions and one of which corresponds to a base population. Moreover, the spatial trend model outperforms the nonspatial model, suggesting a gradual transition of morphosyntax and supporting the idea of a Swiss German dialect continuum.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.