Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) 2017
DOI: 10.18653/v1/w17-1202
|View full text |Cite
|
Sign up to set email alerts
|

Dialectometric analysis of language variation in Twitter

Abstract: In the last few years, microblogging platforms such as Twitter have given rise to a deluge of textual data that can be used for the analysis of informal communication between millions of individuals. In this work, we propose an informationtheoretic approach to geographic language variation using a corpus based on Twitter. We test our models with tens of concepts and their associated keywords detected in Spanish tweets geolocated in Spain. We employ dialectometric measures (cosine similarity and Jensen-Shannon … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(24 citation statements)
references
References 17 publications
0
24
0
Order By: Relevance
“…Even though these international languages have global speech communities, dialectology and sociolinguistics continue to focus largely on sub-national dialects, often within so-called inner-circle varieties (Kachru, 1982). This paper joins recent work in taking a global approach by using geo-referenced texts (Goldhahn et al, 2012;Davies and Fuchs, 2015;Donoso and Sanchez, 2017) to represent national varieties (Szmrecsanyi et al, 2016;Calle-Martin and Romero-Barranco, 2017;Cook and Brinton, 2017;Rangel et al, 2017;Dunn, 2018aDunn, , 2019bTamaredo, 2018). The basic point is that in order to represent regional variation as a complete system, dialectometry must take a global perspective.…”
Section: Introductionmentioning
confidence: 69%
“…Even though these international languages have global speech communities, dialectology and sociolinguistics continue to focus largely on sub-national dialects, often within so-called inner-circle varieties (Kachru, 1982). This paper joins recent work in taking a global approach by using geo-referenced texts (Goldhahn et al, 2012;Davies and Fuchs, 2015;Donoso and Sanchez, 2017) to represent national varieties (Szmrecsanyi et al, 2016;Calle-Martin and Romero-Barranco, 2017;Cook and Brinton, 2017;Rangel et al, 2017;Dunn, 2018aDunn, , 2019bTamaredo, 2018). The basic point is that in order to represent regional variation as a complete system, dialectometry must take a global perspective.…”
Section: Introductionmentioning
confidence: 69%
“…The recent availability of long-term and large-scale digital corpora and the effectiveness of methods for representing words over time played a crucial role in the recent advances in this field. However, only a few attempts focused on social media [28,29], and their goal is to analyze linguistic aspects rather than understanding how lexical semantic change can affect performance in sentiment analysis or hate speech detection. From this perspective, our work represents a novelty: for the first time, we propose to tackle the issue of diachronic degradation of hate speech detection by exploring the temporal robustness of prediction models.…”
Section: Related Workmentioning
confidence: 99%
“…In isolation, web-crawled data provides a single observation of digital language use. Another common source of data is from Twitter (e.g., Eisenstein, et al, 2010;Roller, et al, 2012;Kondor, et al, 2013;Mocanu, et al, 2013;Eisenstein, et al, 2014;Graham, et al, 2014;Donoso & Sanchez, 2017). This paper uses a baseline Twitter corpus as a point of comparison: does the Common Crawl agree with Twitter data?…”
Section: Collection and Preparation Of Twitter Datamentioning
confidence: 99%