Probabilistic corpus-based dialectometry

Wolk, Christoph; Szmrecsanyi, Benedikt

doi:10.1017/jlg.2018.6

Cited by 3 publications

(1 citation statement)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although, in comparison to atlases, they reveal more about the context and magnitude in which linguistic features are used, they come with their own issues. One problem is that the frequencies of the collected features typically need to be normalized to be comparable enough for dialectometrical analysis (Wolk & Szmrecsanyi, 2018). In the current work, we aim to surpass the issue by using transcribed interview data directly, without explicitly defining a list of features beforehand.…”

Section: Introductionmentioning

confidence: 99%

Corpus-based dialectometry with topic models

Kuparinen,

Scherrer

2024

J. of Ling. Geography

View full text Add to dashboard Cite

This paper presents a topic modeling approach to corpus-based dialectometry. Topic models are most often used in text mining to find latent structure in a collection of documents. They are based on the idea that frequently co-occurring words present the same underlying topic. In this study, topic models are used on interview transcriptions containing dialectal speech directly, without any annotations or preselected features. The transcriptions are modeled on complete words, on character n-grams, and after automatical segmentation. Data from three languages, Finnish, Norwegian, and Swiss German, are scrutinized. The proposed method is capable of discovering clear dialectal differences in all three datasets, while reflecting the differences between them. The method provides a significant simplification of the dialectometric workflow, simultaneously saving time and increasing objectivity. Using the method on non-normalized data could also benefit text mining, which is the traditional field of topic modeling.

show abstract

Section: Introductionmentioning

confidence: 99%

Corpus-based dialectometry with topic models

Kuparinen,

Scherrer

2024

J. of Ling. Geography

View full text Add to dashboard Cite

show abstract

English Corpus Linguistics

Szmrecsanyi

Rosseel

2020

The Handbook of English Linguistics

View full text Add to dashboard Cite

Comprehensive Evaluation and Analysis of Ecological Language Development in Consideration of Q-Learning Algorithm

Sun

2022

Mathematical Problems in Engineering

View full text Add to dashboard Cite

The article uses the Q-learning algorithm to investigate the development of ecological language of college students in some cities, and analyzes the results of the investigation. Including the analysis of the language ability of the university, the analysis of the impact of the language environment of the students on the language ability, the analysis of the difference in language use and the analysis of the difference in language behavior. On this basis, summarizing the usage habits and behaviors of some students and giving solutions. In terms of social factors, analyzing the status quo of students and the Mandarin mode of college students. Analyzing the causes of college students’ “bilingualism” problems from the perspectives of sociolinguistics and psycholinguistics, improving language proficiency, and providing targeted solutions from the three perspectives of school, family, and individuals. The results show that only 9.9% of the respondents think their Mandarin is “very good,” only 19% of the respondents who can speak a little Mandarin think that their Mandarin is “very good.” Mandarin is very fluent, and the corresponding respondents rated their Mandarin as “very good” accounting for 32.1%. It can be seen that the Mandarin level of the surrounding contact objects has a great influence on the Mandarin level of the surveyed persons, and there is a positive correlation.

show abstract

Probabilistic corpus-based dialectometry

Cited by 3 publications

References 42 publications

Corpus-based dialectometry with topic models

Corpus-based dialectometry with topic models

English Corpus Linguistics

Comprehensive Evaluation and Analysis of Ecological Language Development in Consideration of Q-Learning Algorithm

Contact Info

Product

Resources

About