2007 Innovations in Information Technologies (IIT) 2007
DOI: 10.1109/iit.2007.4430403
|View full text |Cite
|
Sign up to set email alerts
|

Stemming Versus Light Stemming as Feature Selection Techniques for Arabic Text Categorization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
18
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 37 publications
(21 citation statements)
references
References 6 publications
3
18
0
Order By: Relevance
“…Arabs also use dialect instead of MSA in important fields 22 such as online communication (chat rooms,SMS,Facebook,23 Twitters and others). Most of the research on Arabic is focused 24 on MSA (Duwairi et al, 2007;Harrag et al, 2011;AI-Shalabi 25 et al, 2003;Goweder et al, 2008). Currently, there are 12 26 different Arabic dialects spoken in 28 countries around the 27 world.…”
mentioning
confidence: 98%
“…Arabs also use dialect instead of MSA in important fields 22 such as online communication (chat rooms,SMS,Facebook,23 Twitters and others). Most of the research on Arabic is focused 24 on MSA (Duwairi et al, 2007;Harrag et al, 2011;AI-Shalabi 25 et al, 2003;Goweder et al, 2008). Currently, there are 12 26 different Arabic dialects spoken in 28 countries around the 27 world.…”
mentioning
confidence: 98%
“…Here we note that light stemming maintains the difference between ( ‫الكاتثون‬ ‫الكتاب‬ ) which means "the book" and "the writers" respectively; their light stems are ( ‫كتاتي‬ ‫)كاتة‬ which means book and writer. [12] …”
Section: B Light Stemmingmentioning
confidence: 99%
“…Applying stemming algorithms as a feature selection method reduces the number of features since lexical forms (of words) are derived from basic building blocks; and hence, many features that are generated from the same stem are represented as one feature (their stem). [12] Two types of stemming will be applied to Arabic documents in addition towithout stemming type: a. Root-based Stemming Stemming using root extractor which uses morphological analysis for Arabic words, Figure 2 depicts an example of using stemming for feature selection. Note that several words such as ( ‫المكتثة‬ ‫الكاتة‬ ‫)الكتاب‬ which mean "the library", "the writer" and "the book" respectively are reduced to one stem ‫)كتة(‬ which means write [ 13] as shown in figure 3 [9] which describes preprocessing steps in root based stemming.…”
Section: Arabic Text Pre-processingmentioning
confidence: 99%
“…Removing stop words as mentioned before is one way to eliminate unimportant features [1]. Computing term-goodness based on the statistical characteristics of the dataset such as document frequency, information gain, and mutual information is another way [10]. A threshold method, as a method of feature selection is based on removing some features, the removal will be based on the frequencies of those features by setting that frequencies be greater than or less than a defined threshold value.…”
Section: Feature Selection and Reductionmentioning
confidence: 99%
“…Their results show that Rocchio classifier performs better than k-nearest neighbor classifier. Another study conducted in [10] used stemming and light stemming techniques as feature selection techniques, K-nearest neighbors (KNN) as a classifier. Results reported indicated that light stem was superior over stemming in terms of classifier accuracy.…”
Section: Introductionmentioning
confidence: 99%