2010
DOI: 10.1177/0165551510368620
|View full text |Cite
|
Sign up to set email alerts
|

A parametric methodology for text classification

Abstract: Finding the correct category (class) a new unclassified document belongs to is an interesting and difficult problem, with a wide range of applications. Our methodology for narrative text classification is based on two techniques: we calculate the distance (similarity) between the new unclassified document and all the pre-classified documents of each class and also calculate the similarity of the new document to the ‘average class document’ of each class. In both cases we use key phrases (text phrases or key te… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(11 citation statements)
references
References 22 publications
(25 reference statements)
0
11
0
Order By: Relevance
“…It is important to consider that the solution adopted was technically simple, but with a major impact on the management of documents. The BASGEO allows the decentralization and distribution of workflow functions (Pešović, Vidaković, Ivanović, Budimac, & Vidaković, 2011) despite the need for a better classification of documents (Rocha, et al, 2013;Yang, Lin, & Wei, 2010;Karanikolas & Skourlas, 2010).…”
Section: Discussionmentioning
confidence: 99%
“…It is important to consider that the solution adopted was technically simple, but with a major impact on the management of documents. The BASGEO allows the decentralization and distribution of workflow functions (Pešović, Vidaković, Ivanović, Budimac, & Vidaković, 2011) despite the need for a better classification of documents (Rocha, et al, 2013;Yang, Lin, & Wei, 2010;Karanikolas & Skourlas, 2010).…”
Section: Discussionmentioning
confidence: 99%
“…In addition, there are several studies exploring the frequency of occurrence of linguistic forms in documents, identification of key linguistic forms, identification of the "true quality" of linguistic forms and so forth. In their papers, Karanikolas & Skourlas [11,12] conduct research on automatic classification of documents. They focus on the issue of extracting key-phrases from a collection of texts in order to use them as attributes for text classification.…”
Section: Related Workmentioning
confidence: 99%
“…They look for sequences of words (key-phrases) that will be used as features for classification rules and not for extracting association rules. In their works Karanikolas & Skourlas [11,12] extracted key-phrases which are frequent within the documents of one or few classes but are not so frequent in the documents of the remaining classes of the training set. Furthermore, in [12] it is said that words that constitute key phrases must coexist in a specific window size.…”
Section: Related Workmentioning
confidence: 99%
“…The Porter's stemmer (Porter, 1980) is more granular and it uses file levels. Except the abovepublished stemmers, there is another elder Greek stemmer (nnk's stemmer), incorporated to some commercial (Moumouris, 1995) and some research (Karanikolas, 2007;Karanikolas and Skourlas, 2010) applications. This stemmer uses a large number of suffixes and it elaborates the suffix removal (or replacement) in six iterations (levels).…”
Section: Stemmer Configurationmentioning
confidence: 99%