Alastair J. Gill scite author profile

To what extent does the wording and syntactic form of people's writing reflect their personalities? Using a bottom-up stratified corpus comparison, rather than the top-down content analysis techniques that have been used before, we examine a corpus of e-mail messages elicited from individuals of known personality, as measured by the Eysenck Personality Questionnaire-Revised (S. Eysenck, Eysenck, & Barrett, 1985). This method allowed us to isolate linguistic features associated with different personality types, via both word and part-of-speech n-gram analysis. We investigated the extent to which extraversion is associated with linguistic features involving positivity, sociability, complexity, and implicitness and neuroticism is associated with negativity, self-concern, emphasis, and implicitness. Numerous interesting features were uncovered. For instance, higher levels of extraversion involved a preference for adjectives, whereas lower levels of neuroticism involved a preference for adverbs. However, neither positivity nor negativity was as prominent as expected, and there was little evidence for implicitness.Give two people a communication task-such as e-mailing a friend about recent activities-and they are likely to accomplish it in different ways. Some differences depend on their recent experiences or on what they think interests the recipient. Others might depend on character or personality. For example, the following items are initial excerpts from e-mail messages by different authors:

show abstract

Linguistic mimicry and trust in text-based CMC

Scissors

Gill

Gergle

2008

View full text Add to dashboard Cite

This study examines the relationship between linguistic mimicry and trust establishment in a text-chat environment. Twenty-six participant pairs engaged in a social dilemma investment game and chatted via Instant Messenger (IM) after every five rounds of investment. Results revealed that, within chat sessions, lexical mimicry (repetition of words or word phrases by both partners) was significantly higher for high-trusting pairs than for low-trusting pairs, but that lexical mimicry across chat sessions was significantly higher for low-trusting pairs than for high-trusting pairs. Theoretical and applied implications are discussed.

show abstract

Large Scale Personality Classification of Bloggers

et al. 2011

View full text Add to dashboard Cite

Personality is a fundamental component of an individual's affective behavior. Previous work on personality classification has emerged from disparate sources: Varieties of algorithms and feature-selection across spoken and written data have made comparison difficult. Here, we use a large corpus of blogs to compare classification feature selection; we also use these results to identify characteristic language information relating to personality. Using Support Vector Machines, the best accuracies range from 84.36% (openness to experience) to 70.51% (neuroticism). To achieve these results, the best performing features were a combination of: (1) stemmed bigrams; (2) no exclusion of stopwords (i.e. common words); and (3) the boolean, presence or absence of features noted, rather than their rate of use. We take these findings to suggest that both the structure of the text and the presence of common words are important. We also note that a common dictionary of words used for content analysis (LIWC) performs less well in this classification task, which we propose is due to their conceptual breadth. To get a better sense of how personality is expressed in the blogs, we explore the best performing features and discuss how these can provide a deeper understanding of personality language behavior online.

show abstract

Emotion rating from short blog texts

Gill

Gergle

French

et al. 2008

View full text Add to dashboard Cite

Being able to automatically perceive a variety of emotions from text alone has potentially important applications in CMC and HCI that range from identifying mood from online posts to enabling dynamically adaptive interfaces. However, such ability has not been proven in human raters or computational systems. Here we examine the ability of naive raters of emotion to detect one of eight emotional categories from 50 and 200 word samples of real blog text. Using expert raters as a 'gold standard', naive-expert rater agreement increased with longer texts, and was high for ratings of joy, disgust, anger and anticipation, but low for acceptance and 'neutral' texts. We discuss these findings in light of theories of CMC and potential applications in HCI.

show abstract

Rating e-mail personality at zero acquaintance

Gill

Oberlander

Austin

2006

Personality and Individual Differences

View full text Add to dashboard Cite

Electronic media play an ever-increasing role in our daily communication. But how well can personality traits be perceived through a short e-mail message? Working independently and under experimenter supervision, thirty judges each rated 18 short e-mail texts. These texts were produced by authors of known personality, who briefly described their recent activities, and were collected as part of a previously reported study which demonstrated linguistic characteristics of personality. As predicted by the perception literature, we find that even with minimal textual cues there is relatively high agreement, for ratings of author Extraversion. However, agreement for Neuroticism ratings appears to be further reduced by the environment, especially between target and judges. In addition to reducing the cues available for personality rating, the study extends the previous work in two main ways: first, it measures one further dimension of target personality-Psychoticism-rather than the separate factors Agreeableness and Conscientiousness (along with Openness); and secondly, it adopts additional, novel exemplar-based and subjective measures of personality perception.

show abstract

The language of emotion in short blog texts

Gill

French

Gergle

et al. 2008

View full text Add to dashboard Cite

Emotion is central to human interactions, and automatic detection could enhance our experience with technologies. We investigate the linguistic expression of fine-grained emotion in 50 and 200 word samples of real blog texts previously coded by expert and naive raters. Content analysis (LIWC) reveals angry authors use more affective language and negative affect words, and that joyful authors use more positive affect words. Additionally, a cooccurrence semantic space approach (LSA) was able to identify fear (which naive human emotion raters could not do). We relate our findings to human emotion perception and note potential computational applications.

show abstract

Privacy dictionary: A new resource for the automated content analysis of privacy

Vasalou

Gill

Mazanderani

et al. 2011

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

This article presents the privacy dictionary, a new linguistic resource for automated content analysis on privacyrelated texts. To overcome the definitional challenges inherent in privacy research, the dictionary was informed by an inclusive set of relevant theoretical perspectives. Using methods from corpus linguistics, we constructed and validated eight dictionary categories on empirical material from a wide range of privacy-sensitive contexts. It was shown that the dictionary categories are able to measure unique linguistic patterns within privacy discussions. At a time when privacy considerations are increasing and online resources provide ever-growing quantities of textual data, the privacy dictionary can play a significant role not only for research in the social sciences but also in technology design and policymaking.

show abstract

“Should Be Fun—Not!”

Whalen

Pexman

Gill

2009

Journal of Language and Social Psychology

View full text Add to dashboard Cite

according to Kreuz's principle of inferability, speakers tend to employ nonliteral language when it can reasonably be perceived by their conversational partner. In a computermediated communicative setting, such as e-mail, this suggests that the e-mail writer might use discourse tools that facilitate comprehension on the part of the recipient. The present study examined rates of usage for various forms of nonliteral language in 210 e-mail messages written by young adults. In 94.30% of all e-mails there was at least one nonliteral statement, and participants used an average of 2.90 nonliteral statements per e-mail. Results showed that forms of nonliteral language that are typically deemed to be riskier, such as sarcasm, were used much less frequently than other less risky forms, such as hyperbole, and were marked with discourse markers more often. This indicates that e-mail authors are sensitive to the risky nature of nonliteral language use in e-mail, yet are savvy to the tools available to them in this communicative medium.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alastair J. Gill

Language With Character: A Stratified Corpus Comparison of Individual Differences in E-Mail Communication

Linguistic mimicry and trust in text-based CMC

Large Scale Personality Classification of Bloggers

Emotion rating from short blog texts

Rating e-mail personality at zero acquaintance

The language of emotion in short blog texts

Privacy dictionary: A new resource for the automated content analysis of privacy

“Should Be Fun—Not!”

Contact Info

Product

Resources

About