2000
DOI: 10.1162/089120100750105920
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Text Categorization in Terms of Genre and Author

Abstract: The two main factors that characterize a text are its content and its style, and both can be used as a means of categorization. In this paper we present an approach to text categorization in terms of genre and author for Modern Greek. In contrast to previous stylometric approaches, we attempt to take full advantage of existing natural language processing (NLP) tools. To this end, we propose a set of style markers including analysis-level measures that represent the way in which the input text has been analyzed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
195
1
5

Year Published

2003
2003
2017
2017

Publication Types

Select...
4
3
1

Relationship

4
4

Authors

Journals

citations
Cited by 289 publications
(205 citation statements)
references
References 23 publications
4
195
1
5
Order By: Relevance
“…These measures have been defined in and applied to areas of similar characteristics, such as speaker verification [4] and author verification [9] and are defined as follows:…”
Section: Music Performer Verificationmentioning
confidence: 99%
“…These measures have been defined in and applied to areas of similar characteristics, such as speaker verification [4] and author verification [9] and are defined as follows:…”
Section: Music Performer Verificationmentioning
confidence: 99%
“…Because the current NLP techniques do not provide accurate information enough to be used in information retrieval, text chunking is considered to be an alternative to full parsing [14]. Text chunking is to divide text into syntactically related non-overlapping segments of words.…”
Section: Related Workmentioning
confidence: 99%
“…Stamatatos et al showed experimentally that the syntactic information among various kinds of linguistic information is a reliable clue for document classification [14]. One additional benefit in using syntactic information for document classification by the co-training algorithm is that it is somewhat independent from term weights.…”
Section: Two Viewsmentioning
confidence: 99%
See 1 more Smart Citation
“…Beyond the traditional approach based on human experts, this procedure can be automated by computational tools able to capture and match the stylistic properties of texts and authors [26,32,2]. The main idea is that by measuring some textual features we can distinguish between texts written by different authors.…”
Section: Introductionmentioning
confidence: 99%