Proceedings of the Workshop on Computational Approaches to Arabic Script-Based Languages - Semitic '04 2004
DOI: 10.3115/1621804.1621819
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Arabic document categorization based on the Naïve Bayes algorithm

Abstract: This paper deals with automatic classification of Arabic web documents. Such a classification is very useful for affording directory search functionality, which has been used by many web portals and search engines to cope with an ever-increasing number of documents on the web. In this paper, Naive Bayes (NB) which is a statistical machine learning algorithm, is used to classify non-vocalized Arabic web documents (after their words have been transformed to the corresponding canonical form, i.e., roots) to one o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
103
0
1

Year Published

2006
2006
2019
2019

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 145 publications
(106 citation statements)
references
References 14 publications
0
103
0
1
Order By: Relevance
“…The Naïve Bayesian (NB) applied in this study to classify 300 of the Arabic web document that taken from Al-Jazeera website "the channel of Arabic News in Qatar Television" into five categories "Science, Health, Culture and Art, Business, and Sport". The results showed accuracy in classification reaches 92.8% while the manual methods reached 62.8 [7].…”
Section: Related Workmentioning
confidence: 80%
“…The Naïve Bayesian (NB) applied in this study to classify 300 of the Arabic web document that taken from Al-Jazeera website "the channel of Arabic News in Qatar Television" into five categories "Science, Health, Culture and Art, Business, and Sport". The results showed accuracy in classification reaches 92.8% while the manual methods reached 62.8 [7].…”
Section: Related Workmentioning
confidence: 80%
“…Elkourdi, Bensaid, andRachidi [6] implemented Naïve Bayes algorithms in classifying Arabic documents and reported 68.8% accuracy.…”
Section: Related Workmentioning
confidence: 99%
“…Then, The Arabic stop words are removed. Some Arabic documents may contain foreign words, special characters, numbers [23,24]. Finally, words with length less than three letters are eliminated, often these words are not important and are not useful in TC.…”
Section: A Text Preprocessingmentioning
confidence: 99%