To what extent does the wording and syntactic form of people's writing reflect their personalities? Using a bottom-up stratified corpus comparison, rather than the top-down content analysis techniques that have been used before, we examine a corpus of e-mail messages elicited from individuals of known personality, as measured by the Eysenck Personality Questionnaire-Revised (S. Eysenck, Eysenck, & Barrett, 1985). This method allowed us to isolate linguistic features associated with different personality types, via both word and part-of-speech n-gram analysis. We investigated the extent to which extraversion is associated with linguistic features involving positivity, sociability, complexity, and implicitness and neuroticism is associated with negativity, self-concern, emphasis, and implicitness. Numerous interesting features were uncovered. For instance, higher levels of extraversion involved a preference for adjectives, whereas lower levels of neuroticism involved a preference for adverbs. However, neither positivity nor negativity was as prominent as expected, and there was little evidence for implicitness.Give two people a communication task-such as e-mailing a friend about recent activities-and they are likely to accomplish it in different ways. Some differences depend on their recent experiences or on what they think interests the recipient. Others might depend on character or personality. For example, the following items are initial excerpts from e-mail messages by different authors:
This study examines the relationship between linguistic mimicry and trust establishment in a text-chat environment. Twenty-six participant pairs engaged in a social dilemma investment game and chatted via Instant Messenger (IM) after every five rounds of investment. Results revealed that, within chat sessions, lexical mimicry (repetition of words or word phrases by both partners) was significantly higher for high-trusting pairs than for low-trusting pairs, but that lexical mimicry across chat sessions was significantly higher for low-trusting pairs than for high-trusting pairs. Theoretical and applied implications are discussed.
Personality is a fundamental component of an individual's affective behavior. Previous work on personality classification has emerged from disparate sources: Varieties of algorithms and feature-selection across spoken and written data have made comparison difficult. Here, we use a large corpus of blogs to compare classification feature selection; we also use these results to identify characteristic language information relating to personality. Using Support Vector Machines, the best accuracies range from 84.36% (openness to experience) to 70.51% (neuroticism). To achieve these results, the best performing features were a combination of: (1) stemmed bigrams; (2) no exclusion of stopwords (i.e. common words); and (3) the boolean, presence or absence of features noted, rather than their rate of use. We take these findings to suggest that both the structure of the text and the presence of common words are important. We also note that a common dictionary of words used for content analysis (LIWC) performs less well in this classification task, which we propose is due to their conceptual breadth. To get a better sense of how personality is expressed in the blogs, we explore the best performing features and discuss how these can provide a deeper understanding of personality language behavior online.
Being able to automatically perceive a variety of emotions from text alone has potentially important applications in CMC and HCI that range from identifying mood from online posts to enabling dynamically adaptive interfaces. However, such ability has not been proven in human raters or computational systems. Here we examine the ability of naive raters of emotion to detect one of eight emotional categories from 50 and 200 word samples of real blog text. Using expert raters as a 'gold standard', naive-expert rater agreement increased with longer texts, and was high for ratings of joy, disgust, anger and anticipation, but low for acceptance and 'neutral' texts. We discuss these findings in light of theories of CMC and potential applications in HCI.
Electronic media play an ever-increasing role in our daily communication. But how well can personality traits be perceived through a short e-mail message? Working independently and under experimenter supervision, thirty judges each rated 18 short e-mail texts. These texts were produced by authors of known personality, who briefly described their recent activities, and were collected as part of a previously reported study which demonstrated linguistic characteristics of personality. As predicted by the perception literature, we find that even with minimal textual cues there is relatively high agreement, for ratings of author Extraversion. However, agreement for Neuroticism ratings appears to be further reduced by the environment, especially between target and judges. In addition to reducing the cues available for personality rating, the study extends the previous work in two main ways: first, it measures one further dimension of target personality-Psychoticism-rather than the separate factors Agreeableness and Conscientiousness (along with Openness); and secondly, it adopts additional, novel exemplar-based and subjective measures of personality perception.
Emotion is central to human interactions, and automatic detection could enhance our experience with technologies. We investigate the linguistic expression of fine-grained emotion in 50 and 200 word samples of real blog texts previously coded by expert and naive raters. Content analysis (LIWC) reveals angry authors use more affective language and negative affect words, and that joyful authors use more positive affect words. Additionally, a cooccurrence semantic space approach (LSA) was able to identify fear (which naive human emotion raters could not do). We relate our findings to human emotion perception and note potential computational applications.
This article presents the privacy dictionary, a new linguistic resource for automated content analysis on privacyrelated texts. To overcome the definitional challenges inherent in privacy research, the dictionary was informed by an inclusive set of relevant theoretical perspectives. Using methods from corpus linguistics, we constructed and validated eight dictionary categories on empirical material from a wide range of privacy-sensitive contexts. It was shown that the dictionary categories are able to measure unique linguistic patterns within privacy discussions. At a time when privacy considerations are increasing and online resources provide ever-growing quantities of textual data, the privacy dictionary can play a significant role not only for research in the social sciences but also in technology design and policymaking.
according to Kreuz's principle of inferability, speakers tend to employ nonliteral language when it can reasonably be perceived by their conversational partner. In a computermediated communicative setting, such as e-mail, this suggests that the e-mail writer might use discourse tools that facilitate comprehension on the part of the recipient. The present study examined rates of usage for various forms of nonliteral language in 210 e-mail messages written by young adults. In 94.30% of all e-mails there was at least one nonliteral statement, and participants used an average of 2.90 nonliteral statements per e-mail. Results showed that forms of nonliteral language that are typically deemed to be riskier, such as sarcasm, were used much less frequently than other less risky forms, such as hyperbole, and were marked with discourse markers more often. This indicates that e-mail authors are sensitive to the risky nature of nonliteral language use in e-mail, yet are savvy to the tools available to them in this communicative medium.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.