2013
DOI: 10.4236/jilsa.2013.52009
|View full text |Cite
|
Sign up to set email alerts
|

The Role of Rare Terms in Enhancing the Performance of Polynomial Networks Based Text Categorization

Abstract: In this paper, the role of rare or infrequent terms in enhancing the accuracy of English Text Categorization using Polynomial Networks (PNs) is investigated. To study the impact of rare terms in enhancing the accuracy of PNs-based text categorization, different term reduction criteria as well as different term weighting schemes were experimented on the Reuters Corpus using PNs. Each term weighting scheme on each reduced term set was tested once keeping the rare terms and another time removing them. All t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
8
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 13 publications
1
8
0
Order By: Relevance
“…The significance of low frequent terms in TC performance was always debatable. A recent study has proved that keeping low frequent terms can enhance polynomial networks (PN)‐based TC of the Reuters Data Set to a great extent, regardless of the term‐weighting scheme adopted or the term‐reduction method used . The enhancement on the accuracy recorded when keeping the low frequent terms in this research was great; it reached 17% in some experiments.…”
Section: Introductionmentioning
confidence: 61%
See 1 more Smart Citation
“…The significance of low frequent terms in TC performance was always debatable. A recent study has proved that keeping low frequent terms can enhance polynomial networks (PN)‐based TC of the Reuters Data Set to a great extent, regardless of the term‐weighting scheme adopted or the term‐reduction method used . The enhancement on the accuracy recorded when keeping the low frequent terms in this research was great; it reached 17% in some experiments.…”
Section: Introductionmentioning
confidence: 61%
“…The research conducted in Ref. is extended here to investigate the significance of low frequent terms in TC using other state‐of‐the‐art TC algorithms. Furthermore, additional performance measures are used here to investigate the significance of low frequent terms in TC.…”
Section: Introductionmentioning
confidence: 99%
“…Chi Square (CHI) is used in the experiments of this research as a FS metric for selecting the most discriminating features in the dataset. CHI has proved to record high accuracy in classifying both English [7,6,16,[61][62][63][64][65][66] and Arabic [5,6,16,[55][56][57][58] texts. The CHI FS metric measures the lack of independence between a term and a class.…”
Section: A Feature Selection (Fs)mentioning
confidence: 99%
“…After deciding on the terms to be selected for building the classifier, the terms will be represented in the categorization system using one of the various presentations or weights used in the literature of TC. [3,5,9,14,56,59], Term Frequency (TF) [14,15,55,57,58], Document Frequency (DF) [55], Weighted IDF [14], Normalized Frequency [7,16,[60][61][62][63][64], Boolean [6,55,61,62,64] and other FS methods like Cosine coefficient, Dice coefficient and Jacaard coefficient [68]. In this research, Normalized frequency is used to as a weighting scheme for term representation in the Vector Space Model.…”
Section: A Feature Selection (Fs)mentioning
confidence: 99%
See 1 more Smart Citation