Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016
DOI: 10.18653/v1/p16-1080
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing Biases in Human Perception of User Age and Gender from Text

Abstract: User traits disclosed through written text, such as age and gender, can be used to personalize applications such as recommender systems or conversational agents. However, human perception of these traits is not perfectly aligned with reality. In this paper, we conduct a large-scale crowdsourcing experiment on guessing age and gender from tweets. We systematically analyze the quality and possible biases of these predictions. We identify the textual cues which lead to miss-assessments of traits or make annotator… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
36
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 43 publications
(42 citation statements)
references
References 50 publications
2
36
0
Order By: Relevance
“…Cohen and Ruths (2013) demonstrated that predictive accuracy of classifiers is significantly lower when confronted with users that do not explicitly mention their political orientation. Despite this, their study is limited because in their hardest classification task, they use crowdsourced political orientation labels, which may not correspond to reality and suffer from biases (Flekova et al, 2016a;. Further, they still only look at predicting binary political orientation.…”
Section: Related Workmentioning
confidence: 99%
“…Cohen and Ruths (2013) demonstrated that predictive accuracy of classifiers is significantly lower when confronted with users that do not explicitly mention their political orientation. Despite this, their study is limited because in their hardest classification task, they use crowdsourced political orientation labels, which may not correspond to reality and suffer from biases (Flekova et al, 2016a;. Further, they still only look at predicting binary political orientation.…”
Section: Related Workmentioning
confidence: 99%
“…Setup We use a Support Vector Machine (SVM) with linear kernel and 2 regularization, similar to the state-ofthe-art in author profiling (Flekova et al, 2016a;Basile et al, 2017). We consider a single session of a user as a data instance, and run experiments using 5-fold cross-validation.…”
Section: Methodsmentioning
confidence: 99%
“…Social Media User Profiling: The rapid growth of social media has led to a massive volume of user-generated informal text, which sometimes mimics conversational utterances. A great deal of work has been dedicated to automatically identify latent demographic features of online users, including age and gender [3,4,8,9,17,[34][35][36]41], political orientation and ethnicity [26,[32][33][34]41], regional origin [8,34], personality [14,36], as well as occupational class that can be mapped to income [10,31]. Most of these works focus on user-generated content from Twitter, with a few exceptions that explore Facebook [35,36] or Reddit [8,14] posts.…”
Section: Related Workmentioning
confidence: 99%
“…Most existing studies on social media to capture users' latent attributes rely on classification over hand-crafted features such as word/character n-grams [2,4,34], Linguistic Inquiry and Word Count (LIWC) [27] categories [14,32,33], topic distributions [9,26,31] and sentiment/emotion labels of words derived from existing emotion lexicon [14,26,32,33]. The best performing system [2] in the shared task on author profiling organized by the CLEF PAN lab [11,30] utilizes a linear SVM and word/character n-gram features.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation