Automatic Text Categorization in Terms of Genre and Author

Stamatatos, Efstathios; Kokkinakis, G.; Fakotakis, Nikos

doi:10.1162/089120100750105920

Cited by 289 publications

(205 citation statements)

References 23 publications

Supporting

Mentioning

195

Contrasting

Unclassified

Order By: Relevance

“…These measures have been defined in and applied to areas of similar characteristics, such as speaker verification [4] and author verification [9] and are defined as follows:…”

Section: Music Performer Verificationmentioning

confidence: 99%

Music Performer Verification Based on Learning Ensembles

Stamatatos

Kavallieratou

2004

Methods and Applications of Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

Abstract. In this paper the problem of music performer verification is introduced. Given a certain performance of a musical piece and a set of candidate pianists the task is to examine whether or not a particular pianist is the actual performer. A database of 22 pianists playing pieces by F. Chopin in a computer-controlled piano is used in the presented experiments. An appropriate set of features that captures the idiosyncrasies of music performers is proposed. Well-known machine learning techniques for constructing learning ensembles are applied and remarkable results are described in verifying the actual pianist, a very difficult task even for human experts.

show abstract

“…These measures have been defined in and applied to areas of similar characteristics, such as speaker verification [4] and author verification [9] and are defined as follows:…”

Section: Music Performer Verificationmentioning

confidence: 99%

Music Performer Verification Based on Learning Ensembles

Stamatatos

Kavallieratou

2004

Methods and Applications of Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

show abstract

“…Because the current NLP techniques do not provide accurate information enough to be used in information retrieval, text chunking is considered to be an alternative to full parsing [14]. Text chunking is to divide text into syntactically related non-overlapping segments of words.…”

Section: Related Workmentioning

confidence: 99%

“…Stamatatos et al showed experimentally that the syntactic information among various kinds of linguistic information is a reliable clue for document classification [14]. One additional benefit in using syntactic information for document classification by the co-training algorithm is that it is somewhat independent from term weights.…”

Section: Two Viewsmentioning

confidence: 99%

See 1 more Smart Citation

Large Scale Unstructured Document Classification Using Unlabeled Data and Syntactic Information

Park¹,

Zhang²

2003

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Abstract. Most document classification systems consider only the distribution of content words of the documents, ignoring the syntactic information underlying the documents though it is also an important factor. In this paper, we present an approach for classifying large scale unstructured documents by incorporating both lexical and syntactic information of documents. For this purpose, we use the co-training algorithm, a partially supervised learning algorithm, in which two separated views for the training data are employed and the small number of labeled data are augmented by a large number of unlabeled data. Since both lexical and syntactic information can play roles of separated views for the unstructured documents, the co-training algorithm enhances the performance of document classification using both of them and a large number of unlabeled documents. The experimental results on Reuters-21578 corpus and TREC-7 filtering documents show the effectiveness of unlabeled documents and the use of both lexical and syntactic information.

show abstract

“…Beyond the traditional approach based on human experts, this procedure can be automated by computational tools able to capture and match the stylistic properties of texts and authors [26,32,2]. The main idea is that by measuring some textual features we can distinguish between texts written by different authors.…”

Section: Introductionmentioning

confidence: 99%

Tensor Space Models for Authorship Identification

Plakias

Stamatatos

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. Authorship identification can be viewed as a text categorization task. However, in this task the most frequent features appear to be the most important discriminators, there is usually a shortage of training texts, and the training texts are rarely evenly distributed over the authors. To cope with these problems, we propose tensors of second order for representing the stylistic properties of texts. Our approach requires the calculation of much fewer parameters in comparison to the traditional vector space representation. We examine various methods for building appropriate tensors taking into account that similar features should be placed in the same neighborhood. Based on an existing generalization of SVM able to handle tensors we perform experiments on corpora controlled for genre and topic and show that the proposed approach can effectively handle cases where only limited training texts are available.

show abstract

Automatic Text Categorization in Terms of Genre and Author

Cited by 289 publications

References 23 publications

Music Performer Verification Based on Learning Ensembles

Music Performer Verification Based on Learning Ensembles

Large Scale Unstructured Document Classification Using Unlabeled Data and Syntactic Information

Tensor Space Models for Authorship Identification

Contact Info

Product

Resources

About