Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18 2018
DOI: 10.1145/3178876.3186011
|View full text |Cite
|
Sign up to set email alerts
|

Socioeconomic Dependencies of Linguistic Patterns in Twitter

Abstract: Our usage of language is not solely reliant on cognition but is arguably determined by myriad external factors leading to a global variability of linguistic patterns. This issue, which lies at the core of sociolinguistics and is backed by many small-scale studies on faceto-face communication, is addressed here by constructing a dataset combining the largest French Twitter corpus to date with detailed socioeconomic maps obtained from national census in France. We show how key linguistic variables measured in in… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
33
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 30 publications
(33 citation statements)
references
References 33 publications
0
33
0
Order By: Relevance
“…While diachronic word embeddings' ability to capture semantic shifts is interesting because of its flexibility, we postulate that there is a need to capture contextualized information about tweets such as the characteristics of their authors (including spatial, network, socioeconomic, interested topics) and meta-information such as their topic. To extract features, we make use of the largest French Twitter corpus to date proposed in Abitbol et al (2018). In this section we will describe the set of contextualized feature we propose to inject to our diachronic word embedding model (see Section 4).…”
Section: Contextualized Featuresmentioning
confidence: 99%
See 2 more Smart Citations
“…While diachronic word embeddings' ability to capture semantic shifts is interesting because of its flexibility, we postulate that there is a need to capture contextualized information about tweets such as the characteristics of their authors (including spatial, network, socioeconomic, interested topics) and meta-information such as their topic. To extract features, we make use of the largest French Twitter corpus to date proposed in Abitbol et al (2018). In this section we will describe the set of contextualized feature we propose to inject to our diachronic word embedding model (see Section 4).…”
Section: Contextualized Featuresmentioning
confidence: 99%
“…Users from similar socioeconomic status tend to share similar online behavior in terms of circadian cycles. Specifically, Abitbol et al (2018) found that people of higher socioeconomic status are active to a greater degree during the daytime and also use a more standard language. National Institute of Statistics and Economic Studies (INSEE) of France provided the population level salary for each 4 hectare square patch across the whole French territory, estimated from the 2010 tax return in France.…”
Section: Socioeconomicmentioning
confidence: 99%
See 1 more Smart Citation
“…Corpus size Period Covered Corpus Origin (Miranda Filho et al, 2014) 15.435 Users Sep'13-Oct'13 Brazilian (Preoţiuc-Pietro et al, 2015b) 10.796.836 Aug'14 US (Barberá, 2016) 1.000.000.000 Jul'13-May'14 US (Lampos et al, 2016) 2.082.651 Feb'14-Mar'15 US (Mentink, 2016) 3.000.000.000 Nov'14-Oct'15 Dutch (Hu et al, 2016) 9.800 Users US (Bokányi et al, 2017) 63.000.000 Jan'14 and Oct'14 US (van Dalen et al, 2017) 2.700.000 Sep'16 Dutch (Abitbol et al, 2018) 170.000.000 Jul'14-May'17 French (Levy Abitbol et al, 2019) 90.369.215 Aug'14-Jul'15 French use of searchable web interface. For each variable, Quantcast gives the expected percentage of visitors to a website with a given demographic.…”
Section: Referencesmentioning
confidence: 99%
“…Furthermore, individual SES correlates with other individual or network attributes, as users tend to build social links with others of similar SES, a phenomenon known as status homophily [4], arguably driving the observed stratification of society [5]. At the same time, shared social environment, similar education level, and social influence have been shown to jointly lead socioeconomic groups to exhibit stereotypical behavioral patterns, such as shared political opinion [6] or similar linguistic patterns [7]. Although these features are entangled and causal relation between them is far from understood, they appear as correlations in the data.…”
Section: Introductionmentioning
confidence: 99%