Analyzing Biases in Human Perception of User Age and Gender from Text

Flek, Lucie; Carpenter, Jordan; Giorgi, Salvatore; Ungar, Lyle H.; Preoţiuc-Pietro, Daniel

doi:10.18653/v1/p16-1080

Cited by 43 publications

(42 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Cohen and Ruths (2013) demonstrated that predictive accuracy of classifiers is significantly lower when confronted with users that do not explicitly mention their political orientation. Despite this, their study is limited because in their hardest classification task, they use crowdsourced political orientation labels, which may not correspond to reality and suffer from biases (Flekova et al, 2016a;. Further, they still only look at predicting binary political orientation.…”

Section: Related Workmentioning

confidence: 99%

Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Preoţiuc-Pietro¹,

Liu²,

Hopkins³

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

Self Cite

239

166

View full text Add to dashboard Cite

Automatic political preference prediction from social media posts has to date proven successful only in distinguishing between publicly declared liberals and conservatives in the US. This study examines users' political ideology using a sevenpoint scale which enables us to identify politically moderate and neutral usersgroups which are of particular interest to political scientists and pollsters. Using a novel data set with political ideology labels self-reported through surveys, our goal is two-fold: a) to characterize the political groups of users through language use on Twitter; b) to build a fine-grained model that predicts political ideology of unseen users. Our results identify differences in both political leaning and engagement and the extent to which each group tweets using political keywords. Finally, we demonstrate how to improve ideology prediction accuracy by exploiting the relationships between the user groups.

show abstract

Section: Related Workmentioning

confidence: 99%

Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Preoţiuc-Pietro¹,

Liu²,

Hopkins³

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

Self Cite

239

166

View full text Add to dashboard Cite

show abstract

“…Setup We use a Support Vector Machine (SVM) with linear kernel and 2 regularization, similar to the state-ofthe-art in author profiling (Flekova et al, 2016a;Basile et al, 2017). We consider a single session of a user as a data instance, and run experiments using 5-fold cross-validation.…”

Section: Methodsmentioning

confidence: 99%

Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

2018

View full text Add to dashboard Cite

The idea of organizing PEOPLES stemmed from two related observations, namely the availability of large amounts of spontaneous data covering a range of personal aspects and the fact that such aspects are usually studied in isolation. Social media users nowadays freely express what is on their mind at any moment in time, at any location, and about virtually anything. These large amounts of spontaneously produced texts open up a unique opportunity to learn more about such users, e.g., predicting demographic variables (age, gender), but also personality types, as well as emotions and opinion expressions. This observation is not new, of course, and this opportunity has largely been exploited in the recent years, with abundant works on sentiment analysis, emotion detection, and personality. However, such traits of human personality and behavior have indeed attracted a substantial amount of attention but have been mostly studied in isolation, often in different -but related -communities, such as NLP, CL, AI. Therefore, we thought that the time was ripe to bring these communities a step closer to study people's traits and expressions jointly and in their interplay on such large volumes of available data.The communities' response, with 25 received submissions coming from 11 different countries and going well beyond typical NLP topics, proves again this year that there is wide interest at this intersection, and we are happy to be able to provide a context for exchanging ideas.Following the reviewers's advice, 14 papers were selected for inclusion in the proceedings. They cover a wide range of topics related to the three main PEOPLES themes (personality, emotion and opinion), their interaction and the impact of their modeling on social aspects like well-being, political preferences, humor and language use.To further enrich this volume, we additionally invited our keynote speakers to submit position papers that accompany their talks, and are excited that both of our keynotes submitted excellent papers touching upon issues of making NLP models more demographically aware and how researchers from related fields such as demography can benefit from NLP techniques.We hope that this is just the second edition of what will become series of workshops bringing together researchers in Computational Linguistics, Natural Language Processing and Computational Social Science, who share an interest in personality, opinion and emotion detection, and especially in researching the intertwining of such traits and expressions.We would like to thank our program committee consisting of 33 researchers from a variety of backgrounds for their insightful and constructive reviews. Without their support, this workshop would not have been possible. In addition, we thank all authors for submitting papers and making PEOPLES a big success. Also thanks to our two invited speakers, Dirk Hovy and Letizia Mencarini (Bocconi University, Italy), for having accepted to come to the workshop and share their expertise and ideas on PEOPLES' topics. We thank NAACL for...

show abstract

“…Social Media User Profiling: The rapid growth of social media has led to a massive volume of user-generated informal text, which sometimes mimics conversational utterances. A great deal of work has been dedicated to automatically identify latent demographic features of online users, including age and gender [3,4,8,9,17,[34][35][36]41], political orientation and ethnicity [26,[32][33][34]41], regional origin [8,34], personality [14,36], as well as occupational class that can be mapped to income [10,31]. Most of these works focus on user-generated content from Twitter, with a few exceptions that explore Facebook [35,36] or Reddit [8,14] posts.…”

Section: Related Workmentioning

confidence: 99%

“…Most existing studies on social media to capture users' latent attributes rely on classification over hand-crafted features such as word/character n-grams [2,4,34], Linguistic Inquiry and Word Count (LIWC) [27] categories [14,32,33], topic distributions [9,26,31] and sentiment/emotion labels of words derived from existing emotion lexicon [14,26,32,33]. The best performing system [2] in the shared task on author profiling organized by the CLEF PAN lab [11,30] utilizes a linear SVM and word/character n-gram features.…”

Section: Related Workmentioning

confidence: 99%

“…On the other hand, many efforts have considered the problem of profiling social media users in order to predict latent attributes such as age, gender, or regional origin (e.g., [3,4,8,9,17,[34][35][36]41]). While social media posts and utterances are similar in that both are informal, the former can be associated with many non-textual features that are unavailable outside of the social media domain (e.g., social-network friends, likes, etc.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Listening between the Lines: Learning Personal Attributes from Conversations

Tigunova

Yates

Mirza

et al. 2019

The World Wide Web Conference

View full text Add to dashboard Cite

Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.

show abstract

Analyzing Biases in Human Perception of User Age and Gender from Text

Cited by 43 publications

References 50 publications

Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

Listening between the Lines: Learning Personal Attributes from Conversations

Contact Info

Product

Resources

About