2012
DOI: 10.1002/asi.22627
|View full text |Cite
|
Sign up to set email alerts
|

Using psycholinguistic features for profiling first language of authors

Abstract: This study empirically evaluates the effectiveness of different feature types for the classification of the first language of an author. In particular, it examines the utility of psycholinguistic features, extracted by the Linguistic Inquiry and Word Count (LIWC) tool, that have not previously been applied to the task of author profiling. As LIWC is a tool that has been developed in the psycholinguistic field rather than the computational linguistics field, it was hypothesized that it would be effective, both … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 30 publications
0
10
0
Order By: Relevance
“…We argue that the division of the whole feature space according to linguistic modalities is necessary because different application scenarios have different requirements of the features. Moreover, even for manual feature engineering, writing styles that correspond to different linguistic dimensions are constructed differently by humans [Solorio et al 2011;Torney et al 2012]; therefore, they need to be grouped together.…”
Section: Mining Stylometric Representations For Authorship Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…We argue that the division of the whole feature space according to linguistic modalities is necessary because different application scenarios have different requirements of the features. Moreover, even for manual feature engineering, writing styles that correspond to different linguistic dimensions are constructed differently by humans [Solorio et al 2011;Torney et al 2012]; therefore, they need to be grouped together.…”
Section: Mining Stylometric Representations For Authorship Analysismentioning
confidence: 99%
“…Usually the frequency value of the function words are used to represent the features [Baron 2014;Halvani and Steinebach 2014;HaCohen-Kerner and Margaliot 2014]. They are effective for identifying the first language of the authors [Torney et al 2012;Argamon et al 2009], identifying the actual author of French literature [Boukhaled and Ganascia 2014], and characterizing the gender of e-mails [Corney et al 2002]. Especially for the first language detection, the effect of language transfer affects the use of function words in the secondary language [Torney et al 2012].…”
Section: Joint Learning Model For Topical Modality and Lexical Modalitymentioning
confidence: 99%
See 1 more Smart Citation
“…While emotion-based features have been used in other NLP tasks, such as sentiment analysis (Sidorov et al, 2013), classification of documents into the corresponding emotion category (Wen and Wan, 2014), deception detection (Newman et al, 2003), among others, they are an underexplored area of second language writing. Torney et al (2012) use psycholinguistic features extracted by the Linguistic Inquiry and Word Count (LIWC) tool (Pennebaker et al, 2007) to identify the first language of an author, where emotion-based features are included as part of the feature vector, e.g., percentage of positive/negative emotion words. The LIWC feature set used in the paper also contains other types of features, e.g., personal concern categories (work, leisure), paralinguistic dimensions (assents, fillers, nonfluencies), which obscure the contribution of the actual emotion features.…”
Section: Related Workmentioning
confidence: 99%
“…LIWC has been used in a number of applications, but particularly in psychology; for example, by comparing the scores of the writing of mental health patients over time, Pennebaker (1997) was able to track changes in cognitive and emotional processes and pinpoint potential triggers for recovery. It has also been used in author profiling (Mairesse and Walker, 2006) and first-language profiling (Torney et al, 2012), among others. Figure 1 gives examples of some of the words that make up the different LIWC dimensions.…”
Section: The Liwc Analysismentioning
confidence: 99%