Authorship Identification Using Dynamic Selection of Features from Probabilistic Feature Set

Zamani, Hamed; Esfahani, Hossein Nasr; Babaie, Pariya; Abnar, Samira; Dehghani, Mostafa; Shakery, Azadeh

doi:10.1007/978-3-319-11382-1_13

Cited by 8 publications

(5 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Simple unigram (i.e., n = 1) and bigram (i.e., n = 2) features can hardly capture the relationship among nouns across the whole sentence, and the relationship between each bigram/trigram is considered independent. Second, the current n-gram approach heavily depends on the feature selection method [Zamani et al 2014a;Savoy 2013a;Pavlyshenko 2014]. The space of the complete n-gram (n ∈ N) features is indeed sparse and can be greatly compressed for the problem of authorship analysis.…”

Section: Joint Learning Model For Topical Modality and Lexical Modalitymentioning

confidence: 99%

“…During the feature engineering process, given the available dataset and application scenario, authorship analysts manually select a broad set of features based on the hypotheses or educated guesses, and then refine the selection based on the experimental feedback. As demonstrated by previous research [Savoy 2012;Zamani et al 2014a;Savoy 2013b;Ding et al 2015], the choice of the feature set (i.e., the feature selection method) is a crucial indicator of the prediction result, and it requires explicit knowledge in computational linguistics and tacit experiences in analyzing the textual data. Manual feature engineering is a time-consuming and labor-intensive task.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning Stylometric Representations for Authorship Analysis

Ding

Fung

Iqbal

et al. 2019

IEEE Trans. Cybern.

View full text Add to dashboard Cite

Authorship analysis (AA) is the study of unveiling the hidden properties of authors from a body of exponentially exploding textual data. It extracts an author's identity and sociolinguistic characteristics based on the reflected writing styles in the text. It is an essential process for various areas, such as cybercrime investigation, psycholinguistics, political socialization, etc. However, most of the previous techniques critically depend on the manual feature engineering process. Consequently, the choice of feature set has been shown to be scenario-or dataset-dependent. In this paper, to mimic the human sentence composition process using a neural network approach, we propose to incorporate different categories of linguistic features into distributed representation of words in order to learn simultaneously the writing style representations based on unlabeled texts for authorship analysis. In particular, the proposed models allow topical, lexical, syntactical, and character-level feature vectors of each document to be extracted as stylometrics. We evaluate the performance of our approach on the problems of authorship characterization and authorship verification with the Twitter, novel, and essay datasets. The experiments suggest that our proposed text representation outperforms the bag-of-lexical-n-grams, Latent Dirichlet Allocation, Latent Semantic Analysis, PVDM, PVDBOW, and word2vec representations.

show abstract

Section: Joint Learning Model For Topical Modality and Lexical Modalitymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Learning Stylometric Representations for Authorship Analysis

Ding

Fung

Iqbal

et al. 2019

IEEE Trans. Cybern.

View full text Add to dashboard Cite

show abstract

“…He found that the effect of this algorithm was significantly affected by the corpus size when it was used alone [ 21 ]. Zamani H. et al proposed the maximum likelihood estimation distribution model of lexical and syntactic features as the feature set, and gave the distance calculation method between feature sets and feature selection method, which enhanced the interpretability of multi-level feature sets [ 22 ].…”

Section: Literature Reviewmentioning

confidence: 99%

Author identification of literary works based on text analysis and deep learning

Tang

2024

Heliyon

View full text Add to dashboard Cite

“…In Table 2, the performance of our model, compared to the winner and second ranked of the English literary text section of the shared task (cf. (Modaresi and Gross, 2014) and (Zamani et al, 2014) Table 2: Performance of our model compared to other participants on the "PANLiterary" dataset as the best performing approach of the shared task, the META-CLASSIFIER (MC), by a large margin. The task baseline is the best-performing language-independent approach of the PAN-2013 shared task.…”

Section: Pan Author Verificationmentioning

confidence: 99%

On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification

Soler¹,

Wanner

2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 2

View full text Add to dashboard Cite

The majority of approaches to author profiling and author identification focus mainly on lexical features, i.e., on the content of a text. We argue that syntactic dependency and discourse features play a significantly more prominent role than they were given in the past. We show that they achieve state-of-the-art performance in author and gender identification on a literary corpus while keeping the feature set small: the used feature set is composed of only 188 features and still outperforms the winner of the PAN 2014 shared task on author verification in the literary genre.

show abstract

Authorship Identification Using Dynamic Selection of Features from Probabilistic Feature Set

Cited by 8 publications

References 12 publications

Learning Stylometric Representations for Authorship Analysis

Learning Stylometric Representations for Authorship Analysis

Author identification of literary works based on text analysis and deep learning

On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification

Contact Info

Product

Resources

About